Predict Loan Eligibility for Dream Housing Finance company

Dream Housing Finance company deals in all kinds of home loans. They have presence across all urban, semi urban and rural areas. Customer first applies for home loan and after that company validates the customer eligibility for loan.

Company wants to automate the loan eligibility process (real time) based on customer detail provided while filling online application form. These details are Gender, Marital Status, Education, Number of Dependents, Income, Loan Amount, Credit History and others. To automate this process, they have provided a dataset to identify the customers segments that are eligible for loan amount so that they can specifically target these customers.

image.png

image.png

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
/usr/local/lib/python3.6/dist-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
  import pandas.util.testing as tm
df_train = pd.read_csv('/content/drive/My Drive/loan_prediction/train_ctrUa4K (2).csv')
test = pd.read_csv('/content/drive/My Drive/loan_prediction/test_lAUu6dG (2).csv')
sub = pd.read_csv('/content/drive/My Drive/loan_prediction/sample_submission_49d68Cx.csv')
df_train.shape,test.shape
((614, 13), (367, 12))
df_train['train_or_test']='train'
test['train_or_test']='test'
train=pd.concat([df_train,test])
train
Loan_ID Gender Married Dependents Education Self_Employed ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History Property_Area Loan_Status train_or_test
0 LP001002 Male No 0 Graduate No 5849 0.0 NaN 360.0 1.0 Urban Y train
1 LP001003 Male Yes 1 Graduate No 4583 1508.0 128.0 360.0 1.0 Rural N train
2 LP001005 Male Yes 0 Graduate Yes 3000 0.0 66.0 360.0 1.0 Urban Y train
3 LP001006 Male Yes 0 Not Graduate No 2583 2358.0 120.0 360.0 1.0 Urban Y train
4 LP001008 Male No 0 Graduate No 6000 0.0 141.0 360.0 1.0 Urban Y train
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
362 LP002971 Male Yes 3+ Not Graduate Yes 4009 1777.0 113.0 360.0 1.0 Urban NaN test
363 LP002975 Male Yes 0 Graduate No 4158 709.0 115.0 360.0 1.0 Urban NaN test
364 LP002980 Male No 0 Graduate No 3250 1993.0 126.0 360.0 NaN Semiurban NaN test
365 LP002986 Male Yes 0 Graduate No 5000 2393.0 158.0 360.0 1.0 Rural NaN test
366 LP002989 Male No 0 Graduate Yes 9200 0.0 98.0 180.0 1.0 Rural NaN test

981 rows × 14 columns

train.isna().sum()
Loan_ID                0
Gender                24
Married                3
Dependents            25
Education              0
Self_Employed         55
ApplicantIncome        0
CoapplicantIncome      0
LoanAmount            27
Loan_Amount_Term      20
Credit_History        79
Property_Area          0
Loan_Status          367
train_or_test          0
dtype: int64
train.Loan_Status.value_counts().plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x7fd8000b5828>
train.Loan_Status.value_counts()
Y    422
N    192
Name: Loan_Status, dtype: int64
train.Gender.value_counts().plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x7fd7ffffecf8>
train.Gender.unique()
array(['Male', 'Female', nan], dtype=object)
train['Gender'].fillna('Male',axis=0,inplace=True)
train.Married.unique()
array(['No', 'Yes', nan], dtype=object)
train.Married.value_counts().plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x7fd7ffb32828>
train['Married'].fillna('Yes',axis=0,inplace=True)
train['Dependents'].fillna('2',axis=0,inplace=True)
train.Self_Employed.value_counts().plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x7fd7ffa99080>
train['Self_Employed'].fillna('Yes',axis=0,inplace=True)
train.nunique()
Loan_ID              981
Gender                 2
Married                2
Dependents             4
Education              2
Self_Employed          2
ApplicantIncome      752
CoapplicantIncome    437
LoanAmount           232
Loan_Amount_Term      12
Credit_History         2
Property_Area          3
Loan_Status            2
train_or_test          2
dtype: int64
train['LoanAmount'] = train['LoanAmount']*1000
train['LoanAmount']
0           NaN
1      128000.0
2       66000.0
3      120000.0
4      141000.0
         ...   
362    113000.0
363    115000.0
364    126000.0
365    158000.0
366     98000.0
Name: LoanAmount, Length: 981, dtype: float64
train['Wallet_Size'] = train['CoapplicantIncome']+train['ApplicantIncome']
train
Loan_ID Gender Married Dependents Education Self_Employed ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History Property_Area Loan_Status train_or_test Wallet_Size
0 LP001002 Male No 0 Graduate No 5849 0.0 NaN 360.0 1.0 Urban Y train 5849.0
1 LP001003 Male Yes 1 Graduate No 4583 1508.0 128000.0 360.0 1.0 Rural N train 6091.0
2 LP001005 Male Yes 0 Graduate Yes 3000 0.0 66000.0 360.0 1.0 Urban Y train 3000.0
3 LP001006 Male Yes 0 Not Graduate No 2583 2358.0 120000.0 360.0 1.0 Urban Y train 4941.0
4 LP001008 Male No 0 Graduate No 6000 0.0 141000.0 360.0 1.0 Urban Y train 6000.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
362 LP002971 Male Yes 3+ Not Graduate Yes 4009 1777.0 113000.0 360.0 1.0 Urban NaN test 5786.0
363 LP002975 Male Yes 0 Graduate No 4158 709.0 115000.0 360.0 1.0 Urban NaN test 4867.0
364 LP002980 Male No 0 Graduate No 3250 1993.0 126000.0 360.0 NaN Semiurban NaN test 5243.0
365 LP002986 Male Yes 0 Graduate No 5000 2393.0 158000.0 360.0 1.0 Rural NaN test 7393.0
366 LP002989 Male No 0 Graduate Yes 9200 0.0 98000.0 180.0 1.0 Rural NaN test 9200.0

981 rows × 15 columns

a= train[train['LoanAmount'].isna()]
a = pd.DataFrame(a)
b= train[train['LoanAmount'].notna()]
b = pd.DataFrame(b)
b = b[['Loan_ID','Wallet_Size','Loan_Amount_Term','LoanAmount']]
a = a[['Loan_ID','Wallet_Size','Loan_Amount_Term']]
a.set_index('Loan_ID',inplace=True)
b.set_index('Loan_ID',inplace=True)
from sklearn.linear_model import LinearRegression
b.dropna(inplace=True)
y = b['LoanAmount']
X = b.drop('LoanAmount',axis=1)


model = LinearRegression()
model.fit(X, y)

X_predict = a  # put the dates of which you want to predict kwh here
y_predict = model.predict(X_predict)
l = pd.DataFrame({'S':y_predict},index=a.index)
l.round(2)
S
Loan_ID
LP001002 137481.72
LP001106 125857.87
LP001213 130508.95
LP001266 110840.19
LP001326 144678.17
LP001350 197652.70
LP001356 155885.51
LP001392 149838.33
LP001449 134828.36
LP001682 105782.28
LP001922 251776.50
LP001990 107793.46
LP002054 132406.41
LP002113 106482.21
LP002243 139772.55
LP002393 158277.93
LP002401 118113.77
LP002533 127462.22
LP002697 144562.47
LP002778 143528.90
LP002784 129907.32
LP002960 122813.11
LP001415 149954.03
LP001542 121398.30
LP002057 193279.29
LP002360 169499.38
LP002593 187494.36
train =pd.merge(train,l,how='left', on='Loan_ID')
train
Loan_ID Gender Married Dependents Education Self_Employed ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History Property_Area Loan_Status train_or_test Wallet_Size S
0 LP001002 Male No 0 Graduate No 5849 0.0 NaN 360.0 1.0 Urban Y train 5849.0 137481.717581
1 LP001003 Male Yes 1 Graduate No 4583 1508.0 128000.0 360.0 1.0 Rural N train 6091.0 NaN
2 LP001005 Male Yes 0 Graduate Yes 3000 0.0 66000.0 360.0 1.0 Urban Y train 3000.0 NaN
3 LP001006 Male Yes 0 Not Graduate No 2583 2358.0 120000.0 360.0 1.0 Urban Y train 4941.0 NaN
4 LP001008 Male No 0 Graduate No 6000 0.0 141000.0 360.0 1.0 Urban Y train 6000.0 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
976 LP002971 Male Yes 3+ Not Graduate Yes 4009 1777.0 113000.0 360.0 1.0 Urban NaN test 5786.0 NaN
977 LP002975 Male Yes 0 Graduate No 4158 709.0 115000.0 360.0 1.0 Urban NaN test 4867.0 NaN
978 LP002980 Male No 0 Graduate No 3250 1993.0 126000.0 360.0 NaN Semiurban NaN test 5243.0 NaN
979 LP002986 Male Yes 0 Graduate No 5000 2393.0 158000.0 360.0 1.0 Rural NaN test 7393.0 NaN
980 LP002989 Male No 0 Graduate Yes 9200 0.0 98000.0 180.0 1.0 Rural NaN test 9200.0 NaN

981 rows × 16 columns

train['LoanAmount'].fillna(0,inplace=True)
train['S'].fillna(0,inplace=True)
train['LoanAmount'] = (train['LoanAmount']+train['S']).round()
train.drop("S",inplace=True,axis=1)
train.isna().sum()
Loan_ID                0
Gender                 0
Married                0
Dependents             0
Education              0
Self_Employed          0
ApplicantIncome        0
CoapplicantIncome      0
LoanAmount             0
Loan_Amount_Term      20
Credit_History        79
Property_Area          0
Loan_Status          367
train_or_test          0
Wallet_Size            0
dtype: int64
train
Loan_ID Gender Married Dependents Education Self_Employed ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History Property_Area Loan_Status train_or_test Wallet_Size
0 LP001002 Male No 0 Graduate No 5849 0.0 137482.0 360.0 1.0 Urban Y train 5849.0
1 LP001003 Male Yes 1 Graduate No 4583 1508.0 128000.0 360.0 1.0 Rural N train 6091.0
2 LP001005 Male Yes 0 Graduate Yes 3000 0.0 66000.0 360.0 1.0 Urban Y train 3000.0
3 LP001006 Male Yes 0 Not Graduate No 2583 2358.0 120000.0 360.0 1.0 Urban Y train 4941.0
4 LP001008 Male No 0 Graduate No 6000 0.0 141000.0 360.0 1.0 Urban Y train 6000.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
976 LP002971 Male Yes 3+ Not Graduate Yes 4009 1777.0 113000.0 360.0 1.0 Urban NaN test 5786.0
977 LP002975 Male Yes 0 Graduate No 4158 709.0 115000.0 360.0 1.0 Urban NaN test 4867.0
978 LP002980 Male No 0 Graduate No 3250 1993.0 126000.0 360.0 NaN Semiurban NaN test 5243.0
979 LP002986 Male Yes 0 Graduate No 5000 2393.0 158000.0 360.0 1.0 Rural NaN test 7393.0
980 LP002989 Male No 0 Graduate Yes 9200 0.0 98000.0 180.0 1.0 Rural NaN test 9200.0

981 rows × 15 columns

a= train[train['Loan_Amount_Term'].isna()]
a = pd.DataFrame(a)
b= train[train['Loan_Amount_Term'].notna()]
b = pd.DataFrame(b)
b = b[['Wallet_Size','LoanAmount','Loan_Amount_Term',]]
a = a[['Wallet_Size','LoanAmount']]
from sklearn.linear_model import LinearRegression
b.dropna(inplace=True)
y = b['Loan_Amount_Term']
X = b.drop('Loan_Amount_Term',axis=1)


model = LinearRegression()
model.fit(X, y)

X_predict = a  # put the dates of which you want to predict kwh here
y_predict = model.predict(X_predict)
l = pd.DataFrame({'S':y_predict})
l
S
0 340.045187
1 342.176088
2 339.779839
3 340.553579
4 339.594829
5 342.030482
6 346.320437
7 342.921423
8 343.371710
9 340.986839
10 330.196321
11 342.267031
12 340.567533
13 335.906045
14 345.002177
15 342.703365
16 338.401820
17 336.049875
18 337.689137
19 338.617110
train.Loan_Amount_Term.unique()
array([360., 120., 240.,  nan, 180.,  60., 300., 480.,  36.,  84.,  12.,
       350.,   6.])
train['Loan_Amount_Term'].fillna(360,inplace=True)
train['EMI'] = train['LoanAmount']/train['Loan_Amount_Term']
train['EMI'] = train['EMI'].round()
train['Wallet_Share'] = (train['EMI'] /train['Wallet_Size'])*100
train['Wallet_Share'] = train['Wallet_Share'].round(2)
train.isna().sum()
Loan_ID                0
Gender                 0
Married                0
Dependents             0
Education              0
Self_Employed          0
ApplicantIncome        0
CoapplicantIncome      0
LoanAmount             0
Loan_Amount_Term       0
Credit_History        79
Property_Area          0
Loan_Status          367
train_or_test          0
Wallet_Size            0
EMI                    0
Wallet_Share           0
dtype: int64
a= train[train['Credit_History'].isna()]
a = pd.DataFrame(a)
b= train[train['Credit_History'].notna()]
b = pd.DataFrame(b)
b = b[['Wallet_Size','LoanAmount','Credit_History','Loan_Amount_Term','EMI',"Wallet_Share"]]
a = a[['Wallet_Size','LoanAmount','Loan_Amount_Term','EMI',"Wallet_Share"]]
from sklearn import linear_model
logistic = linear_model.LogisticRegression()
y = b['Credit_History']
X = b.drop('Credit_History',axis=1)



y = np.ravel(y)

logistic.fit(X, y)

preds = logistic.predict(a)
preds
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
train['Credit_History'].fillna(1.0,inplace=True)
train.isna().sum()
Loan_ID                0
Gender                 0
Married                0
Dependents             0
Education              0
Self_Employed          0
ApplicantIncome        0
CoapplicantIncome      0
LoanAmount             0
Loan_Amount_Term       0
Credit_History         0
Property_Area          0
Loan_Status          367
train_or_test          0
Wallet_Size            0
EMI                    0
Wallet_Share           0
dtype: int64
train.to_csv('/content/drive/My Drive/loan_prediction/updatedtrain.csv')
train.columns
Index(['Loan_ID', 'Gender', 'Married', 'Dependents', 'Education',
       'Self_Employed', 'ApplicantIncome', 'CoapplicantIncome', 'LoanAmount',
       'Loan_Amount_Term', 'Credit_History', 'Property_Area', 'Loan_Status',
       'train_or_test', 'Wallet_Size', 'EMI', 'Wallet_Share'],
      dtype='object')
train.dtypes
Loan_ID               object
Gender                object
Married               object
Dependents            object
Education             object
Self_Employed         object
ApplicantIncome        int64
CoapplicantIncome    float64
LoanAmount           float64
Loan_Amount_Term     float64
Credit_History       float64
Property_Area         object
Loan_Status           object
train_or_test         object
Wallet_Size          float64
EMI                  float64
Wallet_Share         float64
dtype: object
train = train[['Loan_ID', 'Gender', 'Married', 'Dependents', 'Education',
       'Self_Employed','Wallet_Size', 'EMI', 'Wallet_Share','LoanAmount',
       'Loan_Amount_Term', 'Credit_History', 'Property_Area', 'Loan_Status','train_or_test']]
train.head(3)
Loan_ID Gender Married Dependents Education Self_Employed Wallet_Size EMI Wallet_Share LoanAmount Loan_Amount_Term Credit_History Property_Area Loan_Status train_or_test
0 LP001002 Male No 0 Graduate No 5849.0 382.0 6.53 137482.0 360.0 1.0 Urban Y train
1 LP001003 Male Yes 1 Graduate No 6091.0 356.0 5.84 128000.0 360.0 1.0 Rural N train
2 LP001005 Male Yes 0 Graduate Yes 3000.0 183.0 6.10 66000.0 360.0 1.0 Urban Y train
train = train.astype({"Wallet_Size": int, "EMI": int,"LoanAmount": int,"Loan_Amount_Term": int,'Credit_History':int})
train.head(3)
Loan_ID Gender Married Dependents Education Self_Employed Wallet_Size EMI Wallet_Share LoanAmount Loan_Amount_Term Credit_History Property_Area Loan_Status train_or_test
0 LP001002 Male No 0 Graduate No 5849 382 6.53 137482 360 1 Urban Y train
1 LP001003 Male Yes 1 Graduate No 6091 356 5.84 128000 360 1 Rural N train
2 LP001005 Male Yes 0 Graduate Yes 3000 183 6.10 66000 360 1 Urban Y train
cat_cols = train.select_dtypes(include=object)
cat_cols
Loan_ID Gender Married Dependents Education Self_Employed Property_Area Loan_Status train_or_test
0 LP001002 Male No 0 Graduate No Urban Y train
1 LP001003 Male Yes 1 Graduate No Rural N train
2 LP001005 Male Yes 0 Graduate Yes Urban Y train
3 LP001006 Male Yes 0 Not Graduate No Urban Y train
4 LP001008 Male No 0 Graduate No Urban Y train
... ... ... ... ... ... ... ... ... ...
976 LP002971 Male Yes 3+ Not Graduate Yes Urban NaN test
977 LP002975 Male Yes 0 Graduate No Urban NaN test
978 LP002980 Male No 0 Graduate No Semiurban NaN test
979 LP002986 Male Yes 0 Graduate No Rural NaN test
980 LP002989 Male No 0 Graduate Yes Rural NaN test

981 rows × 9 columns

num_cols = train.select_dtypes(exclude=object)
num_cols
Wallet_Size EMI Wallet_Share LoanAmount Loan_Amount_Term Credit_History
0 5849 382 6.53 137482 360 1
1 6091 356 5.84 128000 360 1
2 3000 183 6.10 66000 360 1
3 4941 333 6.74 120000 360 1
4 6000 392 6.53 141000 360 1
... ... ... ... ... ... ...
976 5786 314 5.43 113000 360 1
977 4867 319 6.55 115000 360 1
978 5243 350 6.68 126000 360 1
979 7393 439 5.94 158000 360 1
980 9200 544 5.91 98000 180 1

981 rows × 6 columns

train.nunique()
Loan_ID             981
Gender                2
Married               2
Dependents            4
Education             2
Self_Employed         2
Wallet_Size         848
EMI                 301
Wallet_Share        578
LoanAmount          259
Loan_Amount_Term     12
Credit_History        2
Property_Area         3
Loan_Status           2
train_or_test         2
dtype: int64
Gender_map={
 'Female': 0,
 'Male': 1} 


Married_map = {'No': 0, 'Yes': 1}

Property_Area_map = {'Urban':0, 'Rural':1, 'Semiurban':2}

Education_map={'Not Graduate': 0,'Graduate': 1}

Loan_Status_map={'N': 0,'Y': 1}

Dependents_map={'0': 0,
 '1': 1,
 '2': 2,
 '3+': 3}

Self_Employed_map ={'No': 0, 'Yes': 1, 'unknown': 2}

Loan_term_map = {360: 2,
 120: 1,
 240: 3,
 180: 5,
 300: 0,
 60: 4,
 480: 7,
 36: 10,
 84: 8,
 12: 9,
 350: 6,
 6: 11}
# train['Gender'] = train['Gender'].map(Gender_map)
# train['Married'] = train['Married'].map(Married_map)
# train['Property_Area'] = train['Property_Area'].map(Property_Area_map)
# train['Education'] = train['Education'].map(Education_map)
# train['Dependents'] = train['Dependents'].map(Dependents_map)
# train['Self_Employed'] = train['Self_Employed'].map(Self_Employed_map)
# train['Loan_Amount_Term'] = train['Loan_Amount_Term'].map(Loan_term_map)
# train['Loan_Status'] = train['Loan_Status'].map(Loan_Status_map)
train.dtypes
Loan_ID              object
Gender               object
Married              object
Dependents           object
Education            object
Self_Employed        object
Wallet_Size           int64
EMI                   int64
Wallet_Share        float64
LoanAmount            int64
Loan_Amount_Term      int64
Credit_History        int64
Property_Area        object
Loan_Status          object
train_or_test        object
dtype: object
train
Loan_ID Gender Married Dependents Education Self_Employed Wallet_Size EMI Wallet_Share LoanAmount Loan_Amount_Term Credit_History Property_Area Loan_Status train_or_test
0 LP001002 Male No 0 Graduate No 5849 382 6.53 137482 360 1 Urban Y train
1 LP001003 Male Yes 1 Graduate No 6091 356 5.84 128000 360 1 Rural N train
2 LP001005 Male Yes 0 Graduate Yes 3000 183 6.10 66000 360 1 Urban Y train
3 LP001006 Male Yes 0 Not Graduate No 4941 333 6.74 120000 360 1 Urban Y train
4 LP001008 Male No 0 Graduate No 6000 392 6.53 141000 360 1 Urban Y train
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
976 LP002971 Male Yes 3+ Not Graduate Yes 5786 314 5.43 113000 360 1 Urban NaN test
977 LP002975 Male Yes 0 Graduate No 4867 319 6.55 115000 360 1 Urban NaN test
978 LP002980 Male No 0 Graduate No 5243 350 6.68 126000 360 1 Semiurban NaN test
979 LP002986 Male Yes 0 Graduate No 7393 439 5.94 158000 360 1 Rural NaN test
980 LP002989 Male No 0 Graduate Yes 9200 544 5.91 98000 180 1 Rural NaN test

981 rows × 15 columns

train['mean_LoanAmount_per_Gender']=train.groupby(['Gender'])['LoanAmount'].transform('mean')
train['mean_LoanAmount_per_Married']=train.groupby(['Married'])['LoanAmount'].transform('mean')
train['mean_LoanAmount_per_bed']=train.groupby(['Dependents'])['LoanAmount'].transform('mean')
train['mean_LoanAmount_per_department']=train.groupby(['Education'])['LoanAmount'].transform('mean')
train['mean_LoanAmount_per_Self_Employed']=train.groupby(['Self_Employed'])['LoanAmount'].transform('mean')
train['mean_LoanAmount_per_Loan_Amount_Term']=train.groupby(['Loan_Amount_Term'])['LoanAmount'].transform('mean')
train['mean_LoanAmount_per_Credit_History']=train.groupby(['Credit_History'])['LoanAmount'].transform('mean')
train['mean_LoanAmount_per_Property_Area']=train.groupby(['Property_Area'])['LoanAmount'].transform('mean')
train['sum_LoanAmount_per_Gender']=train.groupby(['Gender'])['LoanAmount'].transform('sum')
train['sum_LoanAmount_per_Married']=train.groupby(['Married'])['LoanAmount'].transform('sum')
train['sum_LoanAmount_per_bed']=train.groupby(['Dependents'])['LoanAmount'].transform('sum')
train['sum_LoanAmount_per_department']=train.groupby(['Education'])['LoanAmount'].transform('sum')
train['sum_LoanAmount_per_Self_Employed']=train.groupby(['Self_Employed'])['LoanAmount'].transform('sum')
train['sum_LoanAmount_per_Loan_Amount_Term']=train.groupby(['Loan_Amount_Term'])['LoanAmount'].transform('sum')
train['sum_LoanAmount_per_Credit_History']=train.groupby(['Credit_History'])['LoanAmount'].transform('sum')
train['sum_LoanAmount_per_Property_Area']=train.groupby(['Property_Area'])['LoanAmount'].transform('sum')
train['max_LoanAmount_per_Gender']=train.groupby(['Gender'])['LoanAmount'].transform('max')
train['max_LoanAmount_per_Married']=train.groupby(['Married'])['LoanAmount'].transform('max')
train['max_LoanAmount_per_bed']=train.groupby(['Dependents'])['LoanAmount'].transform('max')
train['max_LoanAmount_per_department']=train.groupby(['Education'])['LoanAmount'].transform('max')
train['max_LoanAmount_per_Self_Employed']=train.groupby(['Self_Employed'])['LoanAmount'].transform('max')
train['max_LoanAmount_per_Loan_Amount_Term']=train.groupby(['Loan_Amount_Term'])['LoanAmount'].transform('max')
train['max_LoanAmount_per_Credit_History']=train.groupby(['Credit_History'])['LoanAmount'].transform('max')
train['max_LoanAmount_per_Property_Area']=train.groupby(['Property_Area'])['LoanAmount'].transform('max')
train['min_LoanAmount_per_Gender']=train.groupby(['Gender'])['LoanAmount'].transform('min')
train['min_LoanAmount_per_Married']=train.groupby(['Married'])['LoanAmount'].transform('min')
train['min_LoanAmount_per_bed']=train.groupby(['Dependents'])['LoanAmount'].transform('min')
train['min_LoanAmount_per_department']=train.groupby(['Education'])['LoanAmount'].transform('min')
train['min_LoanAmount_per_Self_Employed']=train.groupby(['Self_Employed'])['LoanAmount'].transform('min')
train['min_LoanAmount_per_Loan_Amount_Term']=train.groupby(['Loan_Amount_Term'])['LoanAmount'].transform('min')
train['min_LoanAmount_per_Credit_History']=train.groupby(['Credit_History'])['LoanAmount'].transform('min')
train['min_LoanAmount_per_Property_Area']=train.groupby(['Property_Area'])['LoanAmount'].transform('min')
train.set_index('Loan_ID',inplace=True)
train = pd.get_dummies(train,drop_first=True)
train
Wallet_Size EMI Wallet_Share LoanAmount Loan_Amount_Term Credit_History mean_LoanAmount_per_Gender mean_LoanAmount_per_Married mean_LoanAmount_per_bed mean_LoanAmount_per_department mean_LoanAmount_per_Self_Employed mean_LoanAmount_per_Loan_Amount_Term mean_LoanAmount_per_Credit_History mean_LoanAmount_per_Property_Area sum_LoanAmount_per_Gender sum_LoanAmount_per_Married sum_LoanAmount_per_bed sum_LoanAmount_per_department sum_LoanAmount_per_Self_Employed sum_LoanAmount_per_Loan_Amount_Term sum_LoanAmount_per_Credit_History sum_LoanAmount_per_Property_Area max_LoanAmount_per_Gender max_LoanAmount_per_Married max_LoanAmount_per_bed max_LoanAmount_per_department max_LoanAmount_per_Self_Employed max_LoanAmount_per_Loan_Amount_Term max_LoanAmount_per_Credit_History max_LoanAmount_per_Property_Area min_LoanAmount_per_Gender min_LoanAmount_per_Married min_LoanAmount_per_bed min_LoanAmount_per_department min_LoanAmount_per_Self_Employed min_LoanAmount_per_Loan_Amount_Term min_LoanAmount_per_Credit_History min_LoanAmount_per_Property_Area Gender_Male Married_Yes Dependents_1 Dependents_2 Dependents_3+ Education_Not Graduate Self_Employed_Yes Property_Area_Semiurban Property_Area_Urban Loan_Status_Y train_or_test_train
Loan_ID
LP001002 5849 382 6.53 137482 360 1 146131.261577 126137.357349 134147.231193 149327.180865 138855.004957 143941.403321 142541.636255 139619.540936 116758878 43769663 73110241 113936639 112055989 121342603 118737183 47749883 700000 650000 650000 700000 700000 600000 700000 700000 17000 9000 9000 9000 9000 9000 9000 9000 1 0 0 0 0 0 0 0 1 1 1
LP001003 6091 356 5.84 128000 360 1 146131.261577 151552.383281 149153.387500 149327.180865 138855.004957 143941.403321 142541.636255 147093.913793 116758878 96084211 23864542 113936639 112055989 121342603 118737183 42657235 700000 700000 600000 700000 700000 600000 700000 570000 17000 17000 26000 9000 9000 9000 9000 28000 1 1 1 0 0 0 0 0 0 0 1
LP001005 3000 183 6.10 66000 360 1 146131.261577 151552.383281 134147.231193 149327.180865 159757.959770 143941.403321 142541.636255 139619.540936 116758878 96084211 73110241 113936639 27797885 121342603 118737183 47749883 700000 700000 650000 700000 650000 600000 700000 700000 17000 17000 9000 9000 25000 9000 9000 9000 1 1 0 0 0 0 1 0 1 1 1
LP001006 4941 333 6.74 120000 360 1 146131.261577 151552.383281 134147.231193 118886.399083 138855.004957 143941.403321 142541.636255 139619.540936 116758878 96084211 73110241 25917235 112055989 121342603 118737183 47749883 700000 700000 650000 279000 700000 600000 700000 700000 17000 17000 9000 25000 9000 9000 9000 9000 1 1 0 0 0 1 0 0 1 1 1
LP001008 6000 392 6.53 141000 360 1 146131.261577 126137.357349 134147.231193 149327.180865 138855.004957 143941.403321 142541.636255 139619.540936 116758878 43769663 73110241 113936639 112055989 121342603 118737183 47749883 700000 650000 650000 700000 700000 600000 700000 700000 17000 9000 9000 9000 9000 9000 9000 9000 1 0 0 0 0 0 0 0 1 1 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
LP002971 5786 314 5.43 113000 360 1 146131.261577 151552.383281 169453.450549 118886.399083 159757.959770 143941.403321 142541.636255 139619.540936 116758878 96084211 15420264 25917235 27797885 121342603 118737183 47749883 700000 700000 700000 279000 650000 600000 700000 700000 17000 17000 28000 25000 25000 9000 9000 9000 1 1 0 0 1 1 1 0 1 0 0
LP002975 4867 319 6.55 115000 360 1 146131.261577 151552.383281 134147.231193 149327.180865 138855.004957 143941.403321 142541.636255 139619.540936 116758878 96084211 73110241 113936639 112055989 121342603 118737183 47749883 700000 700000 650000 700000 700000 600000 700000 700000 17000 17000 9000 9000 9000 9000 9000 9000 1 1 0 0 0 0 0 0 1 0 0
LP002980 5243 350 6.68 126000 360 1 146131.261577 126137.357349 134147.231193 149327.180865 138855.004957 143941.403321 142541.636255 141681.249284 116758878 43769663 73110241 113936639 112055989 121342603 118737183 49446756 700000 650000 650000 700000 700000 600000 700000 600000 17000 9000 9000 9000 9000 9000 9000 25000 1 0 0 0 0 0 0 1 0 0 0
LP002986 7393 439 5.94 158000 360 1 146131.261577 151552.383281 134147.231193 149327.180865 138855.004957 143941.403321 142541.636255 147093.913793 116758878 96084211 73110241 113936639 112055989 121342603 118737183 42657235 700000 700000 650000 700000 700000 600000 700000 570000 17000 17000 9000 9000 9000 9000 9000 28000 1 1 0 0 0 0 0 0 0 0 0
LP002989 9200 544 5.91 98000 180 1 146131.261577 126137.357349 134147.231193 149327.180865 159757.959770 130615.075758 142541.636255 147093.913793 116758878 43769663 73110241 113936639 27797885 8620595 118737183 42657235 700000 650000 650000 700000 650000 600000 700000 570000 17000 9000 9000 9000 25000 28000 9000 28000 1 0 0 0 0 0 1 0 0 0 0

981 rows × 49 columns

train_df=train.loc[train.train_or_test_train.isin([1])]
test=train.loc[train.train_or_test_train.isin([0])]
train_df.drop(columns={'train_or_test_train'},axis=1,inplace=True)
test.drop(columns={'train_or_test_train'},axis=1,inplace=True)
/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py:3997: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
train_df
Wallet_Size EMI Wallet_Share LoanAmount Loan_Amount_Term Credit_History mean_LoanAmount_per_Gender mean_LoanAmount_per_Married mean_LoanAmount_per_bed mean_LoanAmount_per_department mean_LoanAmount_per_Self_Employed mean_LoanAmount_per_Loan_Amount_Term mean_LoanAmount_per_Credit_History mean_LoanAmount_per_Property_Area sum_LoanAmount_per_Gender sum_LoanAmount_per_Married sum_LoanAmount_per_bed sum_LoanAmount_per_department sum_LoanAmount_per_Self_Employed sum_LoanAmount_per_Loan_Amount_Term sum_LoanAmount_per_Credit_History sum_LoanAmount_per_Property_Area max_LoanAmount_per_Gender max_LoanAmount_per_Married max_LoanAmount_per_bed max_LoanAmount_per_department max_LoanAmount_per_Self_Employed max_LoanAmount_per_Loan_Amount_Term max_LoanAmount_per_Credit_History max_LoanAmount_per_Property_Area min_LoanAmount_per_Gender min_LoanAmount_per_Married min_LoanAmount_per_bed min_LoanAmount_per_department min_LoanAmount_per_Self_Employed min_LoanAmount_per_Loan_Amount_Term min_LoanAmount_per_Credit_History min_LoanAmount_per_Property_Area Gender_Male Married_Yes Dependents_1 Dependents_2 Dependents_3+ Education_Not Graduate Self_Employed_Yes Property_Area_Semiurban Property_Area_Urban Loan_Status_Y
Loan_ID
LP001002 5849 382 6.53 137482 360 1 146131.261577 126137.357349 134147.231193 149327.180865 138855.004957 143941.403321 142541.636255 139619.540936 116758878 43769663 73110241 113936639 112055989 121342603 118737183 47749883 700000 650000 650000 700000 700000 600000 700000 700000 17000 9000 9000 9000 9000 9000 9000 9000 1 0 0 0 0 0 0 0 1 1
LP001003 6091 356 5.84 128000 360 1 146131.261577 151552.383281 149153.387500 149327.180865 138855.004957 143941.403321 142541.636255 147093.913793 116758878 96084211 23864542 113936639 112055989 121342603 118737183 42657235 700000 700000 600000 700000 700000 600000 700000 570000 17000 17000 26000 9000 9000 9000 9000 28000 1 1 1 0 0 0 0 0 0 0
LP001005 3000 183 6.10 66000 360 1 146131.261577 151552.383281 134147.231193 149327.180865 159757.959770 143941.403321 142541.636255 139619.540936 116758878 96084211 73110241 113936639 27797885 121342603 118737183 47749883 700000 700000 650000 700000 650000 600000 700000 700000 17000 17000 9000 9000 25000 9000 9000 9000 1 1 0 0 0 0 1 0 1 1
LP001006 4941 333 6.74 120000 360 1 146131.261577 151552.383281 134147.231193 118886.399083 138855.004957 143941.403321 142541.636255 139619.540936 116758878 96084211 73110241 25917235 112055989 121342603 118737183 47749883 700000 700000 650000 279000 700000 600000 700000 700000 17000 17000 9000 25000 9000 9000 9000 9000 1 1 0 0 0 1 0 0 1 1
LP001008 6000 392 6.53 141000 360 1 146131.261577 126137.357349 134147.231193 149327.180865 138855.004957 143941.403321 142541.636255 139619.540936 116758878 43769663 73110241 113936639 112055989 121342603 118737183 47749883 700000 650000 650000 700000 700000 600000 700000 700000 17000 9000 9000 9000 9000 9000 9000 9000 1 0 0 0 0 0 0 0 1 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
LP002978 2900 197 6.79 71000 360 1 126895.582418 126137.357349 134147.231193 149327.180865 138855.004957 143941.403321 142541.636255 147093.913793 23094996 43769663 73110241 113936639 112055989 121342603 118737183 42657235 600000 650000 650000 700000 700000 600000 700000 570000 9000 9000 9000 9000 9000 9000 9000 28000 0 0 0 0 0 0 0 0 0 1
LP002979 4106 222 5.41 40000 180 1 146131.261577 151552.383281 169453.450549 149327.180865 138855.004957 130615.075758 142541.636255 147093.913793 116758878 96084211 15420264 113936639 112055989 8620595 118737183 42657235 700000 700000 700000 700000 700000 600000 700000 570000 17000 17000 28000 9000 9000 28000 9000 28000 1 1 0 0 1 0 0 0 0 1
LP002983 8312 703 8.46 253000 360 1 146131.261577 151552.383281 149153.387500 149327.180865 138855.004957 143941.403321 142541.636255 139619.540936 116758878 96084211 23864542 113936639 112055989 121342603 118737183 47749883 700000 700000 600000 700000 700000 600000 700000 700000 17000 17000 26000 9000 9000 9000 9000 9000 1 1 1 0 0 0 0 0 1 1
LP002984 7583 519 6.84 187000 360 1 146131.261577 151552.383281 148426.091892 149327.180865 138855.004957 143941.403321 142541.636255 139619.540936 116758878 96084211 27458827 113936639 112055989 121342603 118737183 47749883 700000 700000 480000 700000 700000 600000 700000 700000 17000 17000 17000 9000 9000 9000 9000 9000 1 1 0 1 0 0 0 0 1 1
LP002990 4583 369 8.05 133000 360 0 126895.582418 126137.357349 134147.231193 149327.180865 159757.959770 143941.403321 142680.344595 141681.249284 23094996 43769663 73110241 113936639 27797885 121342603 21116691 49446756 600000 650000 650000 700000 650000 600000 600000 600000 9000 9000 9000 9000 25000 9000 45000 25000 0 0 0 0 0 0 1 1 0 0

614 rows × 48 columns

from sklearn.utils import resample
upsample_data = train_df.copy()
majority= upsample_data[upsample_data['Loan_Status_Y']==1]
minority = upsample_data[upsample_data['Loan_Status_Y']==0]
minority_upsampled =  resample(minority,replace = True,n_samples =422,random_state = 42 )
del upsample_data
upsample_data =  pd.concat([majority,minority_upsampled])
sns.countplot(upsample_data['Loan_Status_Y'])
<matplotlib.axes._subplots.AxesSubplot at 0x7fd7fb8b29e8>
train_df = upsample_data
train_df.columns
Index(['Wallet_Size', 'EMI', 'Wallet_Share', 'LoanAmount', 'Loan_Amount_Term',
       'Credit_History', 'mean_LoanAmount_per_Gender',
       'mean_LoanAmount_per_Married', 'mean_LoanAmount_per_bed',
       'mean_LoanAmount_per_department', 'mean_LoanAmount_per_Self_Employed',
       'mean_LoanAmount_per_Loan_Amount_Term',
       'mean_LoanAmount_per_Credit_History',
       'mean_LoanAmount_per_Property_Area', 'sum_LoanAmount_per_Gender',
       'sum_LoanAmount_per_Married', 'sum_LoanAmount_per_bed',
       'sum_LoanAmount_per_department', 'sum_LoanAmount_per_Self_Employed',
       'sum_LoanAmount_per_Loan_Amount_Term',
       'sum_LoanAmount_per_Credit_History', 'sum_LoanAmount_per_Property_Area',
       'max_LoanAmount_per_Gender', 'max_LoanAmount_per_Married',
       'max_LoanAmount_per_bed', 'max_LoanAmount_per_department',
       'max_LoanAmount_per_Self_Employed',
       'max_LoanAmount_per_Loan_Amount_Term',
       'max_LoanAmount_per_Credit_History', 'max_LoanAmount_per_Property_Area',
       'min_LoanAmount_per_Gender', 'min_LoanAmount_per_Married',
       'min_LoanAmount_per_bed', 'min_LoanAmount_per_department',
       'min_LoanAmount_per_Self_Employed',
       'min_LoanAmount_per_Loan_Amount_Term',
       'min_LoanAmount_per_Credit_History', 'min_LoanAmount_per_Property_Area',
       'Gender_Male', 'Married_Yes', 'Dependents_1', 'Dependents_2',
       'Dependents_3+', 'Education_Not Graduate', 'Self_Employed_Yes',
       'Property_Area_Semiurban', 'Property_Area_Urban', 'Loan_Status_Y'],
      dtype='object')
ref_cols = train_df[['Wallet_Size', 'EMI', 'Wallet_Share', 'LoanAmount', 'Loan_Amount_Term','mean_LoanAmount_per_Gender', 'mean_LoanAmount_per_Married',
       'mean_LoanAmount_per_bed', 'mean_LoanAmount_per_department',
       'mean_LoanAmount_per_Self_Employed',
       'mean_LoanAmount_per_Loan_Amount_Term',
       'mean_LoanAmount_per_Credit_History',
       'mean_LoanAmount_per_Property_Area', 'sum_LoanAmount_per_Gender',
       'sum_LoanAmount_per_Married', 'sum_LoanAmount_per_bed',
       'sum_LoanAmount_per_department', 'sum_LoanAmount_per_Self_Employed',
       'sum_LoanAmount_per_Loan_Amount_Term',
       'sum_LoanAmount_per_Credit_History', 'sum_LoanAmount_per_Property_Area',
       'max_LoanAmount_per_Gender', 'max_LoanAmount_per_Married',
       'max_LoanAmount_per_bed', 'max_LoanAmount_per_department',
       'max_LoanAmount_per_Self_Employed',
       'max_LoanAmount_per_Loan_Amount_Term',
       'max_LoanAmount_per_Credit_History', 'max_LoanAmount_per_Property_Area',
       'min_LoanAmount_per_Gender', 'min_LoanAmount_per_Married',
       'min_LoanAmount_per_bed', 'min_LoanAmount_per_department',
       'min_LoanAmount_per_Self_Employed',
       'min_LoanAmount_per_Loan_Amount_Term',
       'min_LoanAmount_per_Credit_History',
       'min_LoanAmount_per_Property_Area']]
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
# train_df.set_index('Loan_ID',inplace=True)
scaled_train = ss.fit_transform(train_df[ref_cols.columns])
scaled_train = pd.DataFrame(scaled_train,index = train_df.index, columns = ['Wallet_Size', 'EMI', 'Wallet_Share', 'LoanAmount', 'Loan_Amount_Term','mean_LoanAmount_per_Gender', 'mean_LoanAmount_per_Married',
       'mean_LoanAmount_per_bed', 'mean_LoanAmount_per_department',
       'mean_LoanAmount_per_Self_Employed',
       'mean_LoanAmount_per_Loan_Amount_Term',
       'mean_LoanAmount_per_Credit_History',
       'mean_LoanAmount_per_Property_Area', 'sum_LoanAmount_per_Gender',
       'sum_LoanAmount_per_Married', 'sum_LoanAmount_per_bed',
       'sum_LoanAmount_per_department', 'sum_LoanAmount_per_Self_Employed',
       'sum_LoanAmount_per_Loan_Amount_Term',
       'sum_LoanAmount_per_Credit_History', 'sum_LoanAmount_per_Property_Area',
       'max_LoanAmount_per_Gender', 'max_LoanAmount_per_Married',
       'max_LoanAmount_per_bed', 'max_LoanAmount_per_department',
       'max_LoanAmount_per_Self_Employed',
       'max_LoanAmount_per_Loan_Amount_Term',
       'max_LoanAmount_per_Credit_History', 'max_LoanAmount_per_Property_Area',
       'min_LoanAmount_per_Gender', 'min_LoanAmount_per_Married',
       'min_LoanAmount_per_bed', 'min_LoanAmount_per_department',
       'min_LoanAmount_per_Self_Employed',
       'min_LoanAmount_per_Loan_Amount_Term',
       'min_LoanAmount_per_Credit_History',
       'min_LoanAmount_per_Property_Area'])

# test.set_index('Loan_ID',inplace=True)
scaled_test = ss.fit_transform(test[ref_cols.columns])
scaled_test = pd.DataFrame(scaled_test,index=test.index, columns = ['Wallet_Size', 'EMI', 'Wallet_Share', 'LoanAmount', 'Loan_Amount_Term','mean_LoanAmount_per_Gender', 'mean_LoanAmount_per_Married',
       'mean_LoanAmount_per_bed', 'mean_LoanAmount_per_department',
       'mean_LoanAmount_per_Self_Employed',
       'mean_LoanAmount_per_Loan_Amount_Term',
       'mean_LoanAmount_per_Credit_History',
       'mean_LoanAmount_per_Property_Area', 'sum_LoanAmount_per_Gender',
       'sum_LoanAmount_per_Married', 'sum_LoanAmount_per_bed',
       'sum_LoanAmount_per_department', 'sum_LoanAmount_per_Self_Employed',
       'sum_LoanAmount_per_Loan_Amount_Term',
       'sum_LoanAmount_per_Credit_History', 'sum_LoanAmount_per_Property_Area',
       'max_LoanAmount_per_Gender', 'max_LoanAmount_per_Married',
       'max_LoanAmount_per_bed', 'max_LoanAmount_per_department',
       'max_LoanAmount_per_Self_Employed',
       'max_LoanAmount_per_Loan_Amount_Term',
       'max_LoanAmount_per_Credit_History', 'max_LoanAmount_per_Property_Area',
       'min_LoanAmount_per_Gender', 'min_LoanAmount_per_Married',
       'min_LoanAmount_per_bed', 'min_LoanAmount_per_department',
       'min_LoanAmount_per_Self_Employed',
       'min_LoanAmount_per_Loan_Amount_Term',
       'min_LoanAmount_per_Credit_History',
       'min_LoanAmount_per_Property_Area'])
scaled_train = scaled_train 
scaled_train.round(2)
Wallet_Size EMI Wallet_Share LoanAmount Loan_Amount_Term mean_LoanAmount_per_Gender mean_LoanAmount_per_Married mean_LoanAmount_per_bed mean_LoanAmount_per_department mean_LoanAmount_per_Self_Employed mean_LoanAmount_per_Loan_Amount_Term mean_LoanAmount_per_Credit_History mean_LoanAmount_per_Property_Area sum_LoanAmount_per_Gender sum_LoanAmount_per_Married sum_LoanAmount_per_bed sum_LoanAmount_per_department sum_LoanAmount_per_Self_Employed sum_LoanAmount_per_Loan_Amount_Term sum_LoanAmount_per_Credit_History sum_LoanAmount_per_Property_Area max_LoanAmount_per_Gender max_LoanAmount_per_Married max_LoanAmount_per_bed max_LoanAmount_per_department max_LoanAmount_per_Self_Employed max_LoanAmount_per_Loan_Amount_Term max_LoanAmount_per_Credit_History max_LoanAmount_per_Property_Area min_LoanAmount_per_Gender min_LoanAmount_per_Married min_LoanAmount_per_bed min_LoanAmount_per_department min_LoanAmount_per_Self_Employed min_LoanAmount_per_Loan_Amount_Term min_LoanAmount_per_Credit_History min_LoanAmount_per_Property_Area
Loan_ID
LP001002 -0.23 -0.24 -0.17 -0.19 0.27 0.52 -1.27 -0.78 0.57 -0.47 0.15 -0.54 -0.99 0.52 -1.27 0.90 0.57 0.47 0.41 0.54 0.36 0.52 -1.27 0.50 0.57 0.47 0.10 0.54 1.36 0.52 -1.27 -0.83 -0.57 -0.47 -0.35 -0.54 -1.38
LP001005 -0.63 -0.64 -0.23 -0.96 0.27 0.52 0.79 -0.78 0.57 2.12 0.15 -0.54 -0.99 0.52 0.79 0.90 0.57 -2.12 0.41 0.54 0.36 0.52 0.79 0.50 0.57 -2.12 0.10 0.54 1.36 0.52 0.79 -0.83 -0.57 2.12 -0.35 -0.54 -1.38
LP001006 -0.36 -0.34 -0.14 -0.38 0.27 0.52 0.79 -0.78 -1.75 -0.47 0.15 -0.54 -0.99 0.52 0.79 0.90 -1.75 0.47 0.41 0.54 0.36 0.52 0.79 0.50 -1.75 0.47 0.10 0.54 1.36 0.52 0.79 -0.83 1.75 -0.47 -0.35 -0.54 -1.38
LP001008 -0.21 -0.21 -0.17 -0.15 0.27 0.52 -1.27 -0.78 0.57 -0.47 0.15 -0.54 -0.99 0.52 -1.27 0.90 0.57 0.47 0.41 0.54 0.36 0.52 -1.27 0.50 0.57 0.47 0.10 0.54 1.36 0.52 -1.27 -0.83 -0.57 -0.47 -0.35 -0.54 -1.38
LP001011 0.29 0.51 0.00 1.20 0.27 0.52 0.79 0.50 0.57 2.12 0.15 -0.54 -0.99 0.52 0.79 -0.93 0.57 -2.12 0.41 0.54 0.36 0.52 0.79 -2.00 0.57 -2.12 0.10 0.54 1.36 0.52 0.79 0.22 -0.57 2.12 -0.35 -0.54 -1.38
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
LP001532 -0.73 -0.38 0.87 -0.45 0.27 0.52 0.79 0.50 -1.75 -0.47 0.15 -0.54 1.41 0.52 0.79 -0.93 -1.75 0.47 0.41 0.54 -1.42 0.52 0.79 -2.00 -1.75 0.47 0.10 0.54 -0.98 0.52 0.79 0.22 1.75 -0.47 -0.35 -0.54 0.90
LP002556 -0.71 -0.59 0.12 -0.86 0.27 0.52 -1.27 -0.78 0.57 -0.47 0.15 -0.54 -0.99 0.52 -1.27 0.90 0.57 0.47 0.41 0.54 0.36 0.52 -1.27 0.50 0.57 0.47 0.10 0.54 1.36 0.52 -1.27 -0.83 -0.57 -0.47 -0.35 -0.54 -1.38
LP002926 -0.67 -0.42 0.44 -0.53 0.27 0.52 0.79 0.50 0.57 2.12 0.15 1.84 -0.33 0.52 0.79 -0.93 0.57 -2.12 0.41 -1.84 0.96 0.52 0.79 -2.00 0.57 -2.12 0.10 -1.84 -0.44 0.52 0.79 0.22 -0.57 2.12 -0.35 1.84 0.54
LP001641 -0.75 -0.57 0.34 -0.96 -0.69 0.52 0.79 0.56 0.57 2.12 2.75 1.84 1.41 0.52 0.79 -1.07 0.57 -2.12 -2.47 -1.84 -1.42 0.52 0.79 -0.24 0.57 -2.12 1.61 -1.84 -0.98 0.52 0.79 1.39 -0.57 2.12 2.97 1.84 0.90
LP002949 4.83 2.98 -0.44 2.10 -2.61 -1.93 -1.27 2.38 0.57 2.12 -1.41 -0.54 -0.99 -1.93 -1.27 -1.41 0.57 -2.12 -2.34 0.54 0.36 -1.93 -1.27 1.23 0.57 -2.12 0.10 0.54 1.36 -1.93 -1.27 1.65 -0.57 2.12 0.89 -0.54 -1.38

844 rows × 37 columns

scaled_test = scaled_test
scaled_test.round(2)
Wallet_Size EMI Wallet_Share LoanAmount Loan_Amount_Term mean_LoanAmount_per_Gender mean_LoanAmount_per_Married mean_LoanAmount_per_bed mean_LoanAmount_per_department mean_LoanAmount_per_Self_Employed mean_LoanAmount_per_Loan_Amount_Term mean_LoanAmount_per_Credit_History mean_LoanAmount_per_Property_Area sum_LoanAmount_per_Gender sum_LoanAmount_per_Married sum_LoanAmount_per_bed sum_LoanAmount_per_department sum_LoanAmount_per_Self_Employed sum_LoanAmount_per_Loan_Amount_Term sum_LoanAmount_per_Credit_History sum_LoanAmount_per_Property_Area max_LoanAmount_per_Gender max_LoanAmount_per_Married max_LoanAmount_per_bed max_LoanAmount_per_department max_LoanAmount_per_Self_Employed max_LoanAmount_per_Loan_Amount_Term max_LoanAmount_per_Credit_History max_LoanAmount_per_Property_Area min_LoanAmount_per_Gender min_LoanAmount_per_Married min_LoanAmount_per_bed min_LoanAmount_per_department min_LoanAmount_per_Self_Employed min_LoanAmount_per_Loan_Amount_Term min_LoanAmount_per_Credit_History min_LoanAmount_per_Property_Area
Loan_ID
LP001015 -0.13 -0.15 -0.19 -0.43 0.27 0.49 0.76 -0.78 0.54 -0.44 0.15 -0.44 -0.93 0.49 0.76 0.91 0.54 0.44 0.40 0.44 0.36 0.49 0.76 0.49 0.54 0.44 0.15 0.44 1.25 0.49 0.76 -0.82 -0.54 -0.44 -0.33 -0.44 -1.26
LP001022 -0.35 -0.12 -0.07 -0.17 0.27 0.49 0.76 0.53 0.54 -0.44 0.15 -0.44 -0.93 0.49 0.76 -1.06 0.54 0.44 0.40 0.44 0.36 0.49 0.76 -0.22 0.54 0.44 0.15 0.44 1.25 0.49 0.76 1.41 -0.54 -0.44 -0.33 -0.44 -1.26
LP001031 0.08 0.04 -0.02 1.17 0.27 0.49 0.76 0.47 0.54 -0.44 0.15 -0.44 -0.93 0.49 0.76 -0.92 0.54 0.44 0.40 0.44 0.36 0.49 0.76 -1.93 0.54 0.44 0.15 0.44 1.25 0.49 0.76 0.23 -0.54 -0.44 -0.33 -0.44 -1.26
LP001035 -0.29 -0.17 -0.17 -0.60 0.27 0.49 0.76 0.47 0.54 -0.44 0.15 -0.44 -0.93 0.49 0.76 -0.92 0.54 0.44 0.40 0.44 0.36 0.49 0.76 -1.93 0.54 0.44 0.15 0.44 1.25 0.49 0.76 0.23 -0.54 -0.44 -0.33 -0.44 -1.26
LP001051 -0.60 -0.22 -0.12 -0.96 0.27 0.49 -1.32 -0.78 -1.84 -0.44 0.15 -0.44 -0.93 0.49 -1.32 0.91 -1.84 0.44 0.40 0.44 0.36 0.49 -1.32 0.49 -1.84 0.44 0.15 0.44 1.25 0.49 -1.32 -0.82 1.84 -0.44 -0.33 -0.44 -1.26
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
LP002971 -0.11 -0.15 -0.18 -0.39 0.27 0.49 0.76 2.31 -1.84 2.26 0.15 -0.44 -0.93 0.49 0.76 -1.40 -1.84 -2.26 0.40 0.44 0.36 0.49 0.76 1.20 -1.84 -2.26 0.15 0.44 1.25 0.49 0.76 1.67 1.84 2.26 -0.33 -0.44 -1.26
LP002975 -0.29 -0.14 -0.13 -0.35 0.27 0.49 0.76 -0.78 0.54 -0.44 0.15 -0.44 -0.93 0.49 0.76 0.91 0.54 0.44 0.40 0.44 0.36 0.49 0.76 0.49 0.54 0.44 0.15 0.44 1.25 0.49 0.76 -0.82 -0.54 -0.44 -0.33 -0.44 -1.26
LP002980 -0.22 -0.12 -0.12 -0.17 0.27 0.49 -1.32 -0.78 0.54 -0.44 0.15 -0.44 -0.27 0.49 -1.32 0.91 0.54 0.44 0.40 0.44 0.97 0.49 -1.32 0.49 0.54 0.44 0.15 0.44 -0.51 0.49 -1.32 -0.82 -0.54 -0.44 -0.33 -0.44 0.61
LP002986 0.20 -0.06 -0.16 0.35 0.27 0.49 0.76 -0.78 0.54 -0.44 0.15 -0.44 1.46 0.49 0.76 0.91 0.54 0.44 0.40 0.44 -1.47 0.49 0.76 0.49 0.54 0.44 0.15 0.44 -1.04 0.49 0.76 -0.82 -0.54 -0.44 -0.33 -0.44 0.96
LP002989 0.54 0.02 -0.16 -0.63 -2.52 0.49 -1.32 -0.78 0.54 2.26 -1.44 -0.44 1.46 0.49 -1.32 0.91 0.54 -2.26 -2.42 0.44 -1.47 0.49 -1.32 0.49 0.54 -2.26 0.15 0.44 -1.04 0.49 -1.32 -0.82 -0.54 2.26 0.76 -0.44 0.96

367 rows × 37 columns

train_df.drop(ref_cols,axis=1,inplace=True)
test.drop(ref_cols,axis=1,inplace=True)
/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py:3997: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
train_df
Credit_History Gender_Male Married_Yes Dependents_1 Dependents_2 Dependents_3+ Education_Not Graduate Self_Employed_Yes Property_Area_Semiurban Property_Area_Urban Loan_Status_Y
Loan_ID
LP001002 1 1 0 0 0 0 0 0 0 1 1
LP001005 1 1 1 0 0 0 0 1 0 1 1
LP001006 1 1 1 0 0 0 1 0 0 1 1
LP001008 1 1 0 0 0 0 0 0 0 1 1
LP001011 1 1 1 0 1 0 0 1 0 1 1
... ... ... ... ... ... ... ... ... ... ... ...
LP001532 1 1 1 0 1 0 1 0 0 0 0
LP002556 1 1 0 0 0 0 0 0 0 1 0
LP002926 0 1 1 0 1 0 0 1 1 0 0
LP001641 0 1 1 1 0 0 0 1 0 0 0
LP002949 1 0 0 0 0 1 0 1 0 1 0

844 rows × 11 columns

train =pd.merge(train_df,scaled_train,how='left', on='Loan_ID')
test =pd.merge(test,scaled_test,how='left', on='Loan_ID')
train
Credit_History Gender_Male Married_Yes Dependents_1 Dependents_2 Dependents_3+ Education_Not Graduate Self_Employed_Yes Property_Area_Semiurban Property_Area_Urban Loan_Status_Y Wallet_Size EMI Wallet_Share LoanAmount Loan_Amount_Term mean_LoanAmount_per_Gender mean_LoanAmount_per_Married mean_LoanAmount_per_bed mean_LoanAmount_per_department mean_LoanAmount_per_Self_Employed mean_LoanAmount_per_Loan_Amount_Term mean_LoanAmount_per_Credit_History mean_LoanAmount_per_Property_Area sum_LoanAmount_per_Gender sum_LoanAmount_per_Married sum_LoanAmount_per_bed sum_LoanAmount_per_department sum_LoanAmount_per_Self_Employed sum_LoanAmount_per_Loan_Amount_Term sum_LoanAmount_per_Credit_History sum_LoanAmount_per_Property_Area max_LoanAmount_per_Gender max_LoanAmount_per_Married max_LoanAmount_per_bed max_LoanAmount_per_department max_LoanAmount_per_Self_Employed max_LoanAmount_per_Loan_Amount_Term max_LoanAmount_per_Credit_History max_LoanAmount_per_Property_Area min_LoanAmount_per_Gender min_LoanAmount_per_Married min_LoanAmount_per_bed min_LoanAmount_per_department min_LoanAmount_per_Self_Employed min_LoanAmount_per_Loan_Amount_Term min_LoanAmount_per_Credit_History min_LoanAmount_per_Property_Area
Loan_ID
LP001002 1 1 0 0 0 0 0 0 0 1 1 -0.234025 -0.235357 -0.167667 -0.189980 0.271763 0.516979 -1.273231 -0.779985 0.571878 -0.472428 0.148021 -0.542659 -0.989367 0.516979 -1.273231 0.904602 0.571878 0.472428 0.412845 0.542659 0.362091 0.516979 -1.273231 0.495584 0.571878 0.472428 0.095731 0.542659 1.362716 0.516979 -1.273231 -0.826202 -0.571878 -0.472428 -0.347567 -0.542659 -1.381823
LP001005 1 1 1 0 0 0 0 1 0 1 1 -0.632087 -0.644692 -0.229408 -0.959209 0.271763 0.516979 0.785403 -0.779985 0.571878 2.116724 0.148021 -0.542659 -0.989367 0.516979 0.785403 0.904602 0.571878 -2.116724 0.412845 0.542659 0.362091 0.516979 0.785403 0.495584 0.571878 -2.116724 0.095731 0.542659 1.362716 0.516979 0.785403 -0.826202 -0.571878 2.116724 -0.347567 -0.542659 -1.381823
LP001006 1 1 1 0 0 0 1 0 0 1 1 -0.360891 -0.336148 -0.137515 -0.378107 0.271763 0.516979 0.785403 -0.779985 -1.748626 -0.472428 0.148021 -0.542659 -0.989367 0.516979 0.785403 0.904602 -1.748626 0.472428 0.412845 0.542659 0.362091 0.516979 0.785403 0.495584 -1.748626 0.472428 0.095731 0.542659 1.362716 0.516979 0.785403 -0.826202 1.748626 -0.472428 -0.347567 -0.542659 -1.381823
LP001008 1 1 0 0 0 0 0 0 0 1 1 -0.212927 -0.214787 -0.167667 -0.152122 0.271763 0.516979 -1.273231 -0.779985 0.571878 -0.472428 0.148021 -0.542659 -0.989367 0.516979 -1.273231 0.904602 0.571878 0.472428 0.412845 0.542659 0.362091 0.516979 -1.273231 0.495584 0.571878 0.472428 0.095731 0.542659 1.362716 0.516979 -1.273231 -0.826202 -0.571878 -0.472428 -0.347567 -0.542659 -1.381823
LP001011 1 1 1 0 1 0 0 1 0 1 1 0.291880 0.505150 0.003197 1.203783 0.271763 0.516979 0.785403 0.496393 0.571878 2.116724 0.148021 -0.542659 -0.989367 0.516979 0.785403 -0.925653 0.571878 -2.116724 0.412845 0.542659 0.362091 0.516979 0.785403 -1.996753 0.571878 -2.116724 0.095731 0.542659 1.362716 0.516979 0.785403 0.218464 -0.571878 2.116724 -0.347567 -0.542659 -1.381823
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
LP002949 1 0 0 0 0 1 0 1 0 1 0 4.828584 2.977619 -0.441910 2.096959 -2.612162 -1.934315 -1.273231 2.376014 0.571878 2.116724 -1.408707 -0.542659 -0.989367 -1.934315 -1.273231 -1.408302 0.571878 -2.116724 -2.343089 0.542659 0.362091 -1.934315 -1.273231 1.228624 0.571878 -2.116724 0.095731 0.542659 1.362716 -1.934315 -1.273231 1.654879 -0.571878 2.116724 0.887986 -0.542659 -1.381823
LP002949 1 0 0 0 0 1 0 1 0 1 0 4.828584 2.977619 -0.441910 2.096959 -2.612162 -1.934315 -1.273231 2.376014 0.571878 2.116724 -1.408707 -0.542659 -0.989367 -1.934315 -1.273231 -1.408302 0.571878 -2.116724 -2.343089 0.542659 0.362091 -1.934315 -1.273231 1.228624 0.571878 -2.116724 0.095731 0.542659 1.362716 -1.934315 -1.273231 1.654879 -0.571878 2.116724 0.887986 -0.542659 -1.381823
LP002949 1 0 0 0 0 1 0 1 0 1 0 4.828584 2.977619 -0.441910 2.096959 -2.612162 -1.934315 -1.273231 2.376014 0.571878 2.116724 -1.408707 -0.542659 -0.989367 -1.934315 -1.273231 -1.408302 0.571878 -2.116724 -2.343089 0.542659 0.362091 -1.934315 -1.273231 1.228624 0.571878 -2.116724 0.095731 0.542659 1.362716 -1.934315 -1.273231 1.654879 -0.571878 2.116724 0.887986 -0.542659 -1.381823
LP002949 1 0 0 0 0 1 0 1 0 1 0 4.828584 2.977619 -0.441910 2.096959 -2.612162 -1.934315 -1.273231 2.376014 0.571878 2.116724 -1.408707 -0.542659 -0.989367 -1.934315 -1.273231 -1.408302 0.571878 -2.116724 -2.343089 0.542659 0.362091 -1.934315 -1.273231 1.228624 0.571878 -2.116724 0.095731 0.542659 1.362716 -1.934315 -1.273231 1.654879 -0.571878 2.116724 0.887986 -0.542659 -1.381823
LP002949 1 0 0 0 0 1 0 1 0 1 0 4.828584 2.977619 -0.441910 2.096959 -2.612162 -1.934315 -1.273231 2.376014 0.571878 2.116724 -1.408707 -0.542659 -0.989367 -1.934315 -1.273231 -1.408302 0.571878 -2.116724 -2.343089 0.542659 0.362091 -1.934315 -1.273231 1.228624 0.571878 -2.116724 0.095731 0.542659 1.362716 -1.934315 -1.273231 1.654879 -0.571878 2.116724 0.887986 -0.542659 -1.381823

1902 rows × 48 columns

train.columns
Index(['Credit_History', 'Gender_Male', 'Married_Yes', 'Dependents_1',
       'Dependents_2', 'Dependents_3+', 'Education_Not Graduate',
       'Self_Employed_Yes', 'Property_Area_Semiurban', 'Property_Area_Urban',
       'Loan_Status_Y', 'Wallet_Size', 'EMI', 'Wallet_Share', 'LoanAmount',
       'Loan_Amount_Term', 'mean_LoanAmount_per_Gender',
       'mean_LoanAmount_per_Married', 'mean_LoanAmount_per_bed',
       'mean_LoanAmount_per_department', 'mean_LoanAmount_per_Self_Employed',
       'mean_LoanAmount_per_Loan_Amount_Term',
       'mean_LoanAmount_per_Credit_History',
       'mean_LoanAmount_per_Property_Area', 'sum_LoanAmount_per_Gender',
       'sum_LoanAmount_per_Married', 'sum_LoanAmount_per_bed',
       'sum_LoanAmount_per_department', 'sum_LoanAmount_per_Self_Employed',
       'sum_LoanAmount_per_Loan_Amount_Term',
       'sum_LoanAmount_per_Credit_History', 'sum_LoanAmount_per_Property_Area',
       'max_LoanAmount_per_Gender', 'max_LoanAmount_per_Married',
       'max_LoanAmount_per_bed', 'max_LoanAmount_per_department',
       'max_LoanAmount_per_Self_Employed',
       'max_LoanAmount_per_Loan_Amount_Term',
       'max_LoanAmount_per_Credit_History', 'max_LoanAmount_per_Property_Area',
       'min_LoanAmount_per_Gender', 'min_LoanAmount_per_Married',
       'min_LoanAmount_per_bed', 'min_LoanAmount_per_department',
       'min_LoanAmount_per_Self_Employed',
       'min_LoanAmount_per_Loan_Amount_Term',
       'min_LoanAmount_per_Credit_History',
       'min_LoanAmount_per_Property_Area'],
      dtype='object')
test.drop('Loan_Status_Y',inplace=True,axis=1)
to_drop = train[['mean_LoanAmount_per_Gender', 'mean_LoanAmount_per_Married',
       'mean_LoanAmount_per_bed', 'mean_LoanAmount_per_department',
       'mean_LoanAmount_per_Self_Employed',
       'mean_LoanAmount_per_Loan_Amount_Term',
       'mean_LoanAmount_per_Property_Area', 'sum_LoanAmount_per_Gender',
       'sum_LoanAmount_per_Married', 'sum_LoanAmount_per_bed',
       'sum_LoanAmount_per_department', 'sum_LoanAmount_per_Self_Employed',
       'sum_LoanAmount_per_Loan_Amount_Term',
       'sum_LoanAmount_per_Credit_History', 'sum_LoanAmount_per_Property_Area',
       'max_LoanAmount_per_Gender']]
to_drop_test = test[['mean_LoanAmount_per_Gender', 'mean_LoanAmount_per_Married',
       'mean_LoanAmount_per_bed', 'mean_LoanAmount_per_department',
       'mean_LoanAmount_per_Self_Employed',
       'mean_LoanAmount_per_Loan_Amount_Term',
       'mean_LoanAmount_per_Property_Area', 'sum_LoanAmount_per_Gender',
       'sum_LoanAmount_per_Married', 'sum_LoanAmount_per_bed',
       'sum_LoanAmount_per_department', 'sum_LoanAmount_per_Self_Employed',
       'sum_LoanAmount_per_Loan_Amount_Term',
       'sum_LoanAmount_per_Credit_History', 'sum_LoanAmount_per_Property_Area',
       'max_LoanAmount_per_Gender']]
train.drop(to_drop,axis=1,inplace=True)
test.drop(to_drop_test,axis=1,inplace=True)
train
Credit_History Gender_Male Married_Yes Dependents_1 Dependents_2 Dependents_3+ Education_Not Graduate Self_Employed_Yes Property_Area_Semiurban Property_Area_Urban Loan_Status_Y Wallet_Size EMI Wallet_Share LoanAmount Loan_Amount_Term mean_LoanAmount_per_Credit_History max_LoanAmount_per_Married max_LoanAmount_per_bed max_LoanAmount_per_department max_LoanAmount_per_Self_Employed max_LoanAmount_per_Loan_Amount_Term max_LoanAmount_per_Credit_History max_LoanAmount_per_Property_Area min_LoanAmount_per_Gender min_LoanAmount_per_Married min_LoanAmount_per_bed min_LoanAmount_per_department min_LoanAmount_per_Self_Employed min_LoanAmount_per_Loan_Amount_Term min_LoanAmount_per_Credit_History min_LoanAmount_per_Property_Area
Loan_ID
LP001002 1 1 0 0 0 0 0 0 0 1 1 -0.234025 -0.235357 -0.167667 -0.189980 0.271763 -0.542659 -1.273231 0.495584 0.571878 0.472428 0.095731 0.542659 1.362716 0.516979 -1.273231 -0.826202 -0.571878 -0.472428 -0.347567 -0.542659 -1.381823
LP001005 1 1 1 0 0 0 0 1 0 1 1 -0.632087 -0.644692 -0.229408 -0.959209 0.271763 -0.542659 0.785403 0.495584 0.571878 -2.116724 0.095731 0.542659 1.362716 0.516979 0.785403 -0.826202 -0.571878 2.116724 -0.347567 -0.542659 -1.381823
LP001006 1 1 1 0 0 0 1 0 0 1 1 -0.360891 -0.336148 -0.137515 -0.378107 0.271763 -0.542659 0.785403 0.495584 -1.748626 0.472428 0.095731 0.542659 1.362716 0.516979 0.785403 -0.826202 1.748626 -0.472428 -0.347567 -0.542659 -1.381823
LP001008 1 1 0 0 0 0 0 0 0 1 1 -0.212927 -0.214787 -0.167667 -0.152122 0.271763 -0.542659 -1.273231 0.495584 0.571878 0.472428 0.095731 0.542659 1.362716 0.516979 -1.273231 -0.826202 -0.571878 -0.472428 -0.347567 -0.542659 -1.381823
LP001011 1 1 1 0 1 0 0 1 0 1 1 0.291880 0.505150 0.003197 1.203783 0.271763 -0.542659 0.785403 -1.996753 0.571878 -2.116724 0.095731 0.542659 1.362716 0.516979 0.785403 0.218464 -0.571878 2.116724 -0.347567 -0.542659 -1.381823
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
LP002949 1 0 0 0 0 1 0 1 0 1 0 4.828584 2.977619 -0.441910 2.096959 -2.612162 -0.542659 -1.273231 1.228624 0.571878 -2.116724 0.095731 0.542659 1.362716 -1.934315 -1.273231 1.654879 -0.571878 2.116724 0.887986 -0.542659 -1.381823
LP002949 1 0 0 0 0 1 0 1 0 1 0 4.828584 2.977619 -0.441910 2.096959 -2.612162 -0.542659 -1.273231 1.228624 0.571878 -2.116724 0.095731 0.542659 1.362716 -1.934315 -1.273231 1.654879 -0.571878 2.116724 0.887986 -0.542659 -1.381823
LP002949 1 0 0 0 0 1 0 1 0 1 0 4.828584 2.977619 -0.441910 2.096959 -2.612162 -0.542659 -1.273231 1.228624 0.571878 -2.116724 0.095731 0.542659 1.362716 -1.934315 -1.273231 1.654879 -0.571878 2.116724 0.887986 -0.542659 -1.381823
LP002949 1 0 0 0 0 1 0 1 0 1 0 4.828584 2.977619 -0.441910 2.096959 -2.612162 -0.542659 -1.273231 1.228624 0.571878 -2.116724 0.095731 0.542659 1.362716 -1.934315 -1.273231 1.654879 -0.571878 2.116724 0.887986 -0.542659 -1.381823
LP002949 1 0 0 0 0 1 0 1 0 1 0 4.828584 2.977619 -0.441910 2.096959 -2.612162 -0.542659 -1.273231 1.228624 0.571878 -2.116724 0.095731 0.542659 1.362716 -1.934315 -1.273231 1.654879 -0.571878 2.116724 0.887986 -0.542659 -1.381823

1902 rows × 32 columns

test
Credit_History Gender_Male Married_Yes Dependents_1 Dependents_2 Dependents_3+ Education_Not Graduate Self_Employed_Yes Property_Area_Semiurban Property_Area_Urban Wallet_Size EMI Wallet_Share LoanAmount Loan_Amount_Term mean_LoanAmount_per_Credit_History max_LoanAmount_per_Married max_LoanAmount_per_bed max_LoanAmount_per_department max_LoanAmount_per_Self_Employed max_LoanAmount_per_Loan_Amount_Term max_LoanAmount_per_Credit_History max_LoanAmount_per_Property_Area min_LoanAmount_per_Gender min_LoanAmount_per_Married min_LoanAmount_per_bed min_LoanAmount_per_department min_LoanAmount_per_Self_Employed min_LoanAmount_per_Loan_Amount_Term min_LoanAmount_per_Credit_History min_LoanAmount_per_Property_Area
Loan_ID
LP001015 1 1 1 0 0 0 0 0 0 1 -0.126182 -0.153374 -0.188810 -0.434502 0.266022 -0.437674 0.758358 0.490793 0.544812 0.442086 0.145459 0.437674 1.245733 0.485479 0.758358 -0.823291 -0.544812 -0.442086 -0.327040 -0.437674 -1.261226
LP001022 1 1 1 1 0 0 0 0 0 1 -0.346506 -0.121786 -0.066521 -0.172326 0.266022 -0.437674 0.758358 -0.222275 0.544812 0.442086 0.145459 0.437674 1.245733 0.485479 0.758358 1.411919 -0.544812 -0.442086 -0.327040 -0.437674 -1.261226
LP001031 1 1 1 0 1 0 0 0 0 1 0.081817 0.041899 -0.021327 1.171325 0.266022 -0.437674 0.758358 -1.933637 0.544812 0.442086 0.145459 0.437674 1.245733 0.485479 0.758358 0.228573 -0.544812 -0.442086 -0.327040 -0.437674 -1.261226
LP001035 1 1 1 0 1 0 0 0 0 1 -0.286803 -0.173476 -0.170733 -0.598362 0.266022 -0.437674 0.758358 -1.933637 0.544812 0.442086 0.145459 0.437674 1.245733 0.485479 0.758358 0.228573 -0.544812 -0.442086 -0.327040 -0.437674 -1.261226
LP001051 1 1 0 0 0 0 1 0 0 1 -0.596875 -0.217268 -0.121285 -0.958854 0.266022 -0.437674 -1.318638 0.490793 -1.835497 0.442086 0.145459 0.437674 1.245733 0.485479 -1.318638 -0.823291 1.835497 -0.442086 -0.327040 -0.437674 -1.261226
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
LP002971 1 1 1 0 0 1 1 1 0 1 -0.113471 -0.147631 -0.184557 -0.385344 0.266022 -0.437674 0.758358 1.203860 -1.835497 -2.262005 0.145459 0.437674 1.245733 0.485479 0.758358 1.674885 1.835497 2.262005 -0.327040 -0.437674 -1.261226
LP002975 1 1 1 0 0 0 0 0 0 1 -0.290462 -0.144041 -0.125007 -0.352572 0.266022 -0.437674 0.758358 0.490793 0.544812 0.442086 0.145459 0.437674 1.245733 0.485479 0.758358 -0.823291 -0.544812 -0.442086 -0.327040 -0.437674 -1.261226
LP002980 1 1 0 0 0 0 0 0 1 0 -0.218048 -0.121786 -0.118095 -0.172326 0.266022 -0.437674 -1.318638 0.490793 0.544812 0.442086 0.145459 0.437674 -0.510641 0.485479 -1.318638 -0.823291 -0.544812 -0.442086 -0.327040 -0.437674 0.606597
LP002986 1 1 1 0 0 0 0 0 0 0 0.196024 -0.057891 -0.157441 0.352025 0.266022 -0.437674 0.758358 0.490793 0.544812 0.442086 0.145459 0.437674 -1.037553 0.485479 0.758358 -0.823291 -0.544812 -0.442086 -0.327040 -0.437674 0.956814
LP002989 1 1 0 0 0 0 0 1 0 0 0.544037 0.017490 -0.159036 -0.631134 -2.521639 -0.437674 -1.318638 0.490793 0.544812 -2.262005 0.145459 0.437674 -1.037553 0.485479 -1.318638 -0.823291 -0.544812 2.262005 0.758888 -0.437674 0.956814

367 rows × 31 columns

from sklearn.model_selection import train_test_split
import xgboost as xgb
y = train['Loan_Status_Y']
X =train.drop('Loan_Status_Y',axis = 1)
X_train, X_test,y_train, y_test = train_test_split(X, y,test_size = 0.2,random_state = 0)
y_train
Loan_ID
LP001241    0
LP001151    1
LP002328    0
LP001883    0
LP002341    0
           ..
LP002367    0
LP002788    0
LP001029    0
LP002949    0
LP001014    0
Name: Loan_Status_Y, Length: 1521, dtype: uint8
from sklearn.metrics import accuracy_score
from sklearn.metrics import log_loss
from sklearn.metrics import classification_report
import xgboost as xgb
from sklearn.model_selection import RandomizedSearchCV

clf = xgb.XGBClassifier()

param_grid = {
        'silent': [False],
        'max_depth': [6, 10, 15, 20],
        'learning_rate': [0.001, 0.01, 0.1, 0.2, 0,3],
        'subsample': [0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
        'colsample_bytree': [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
        'colsample_bylevel': [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
        'min_child_weight': [0.5, 1.0, 3.0, 5.0, 7.0, 10.0],
        'gamma': [0, 0.25, 0.5, 1.0],
        'reg_lambda': [0.1, 1.0, 5.0, 10.0, 50.0, 100.0],
        'n_estimators': [100]}

fit_params = {'eval_metric': 'mlogloss',
              'early_stopping_rounds': 10,
              'eval_set': [(X_test, y_test)]}

rs_clf = RandomizedSearchCV(clf, param_grid, n_iter=20,
                            n_jobs=1, verbose=2, cv=2,
                            scoring='neg_log_loss', refit=False, random_state=42)
print("Randomized search..")

rs_clf.fit(X_train, y_train)
# print("Randomized search time:", time.time() - search_time_start)

best_score = rs_clf.best_score_
best_params = rs_clf.best_params_
print("Best score: {}".format(best_score))
print("Best params: ")
for param_name in sorted(best_params.keys()):
    print('%s: %r' % (param_name, best_params[param_name]))
Randomized search..
Fitting 2 folds for each of 20 candidates, totalling 40 fits
[CV] subsample=0.7, silent=False, reg_lambda=50.0, n_estimators=100, min_child_weight=5.0, max_depth=6, learning_rate=0.2, gamma=1.0, colsample_bytree=0.9, colsample_bylevel=0.4 
[CV]  subsample=0.7, silent=False, reg_lambda=50.0, n_estimators=100, min_child_weight=5.0, max_depth=6, learning_rate=0.2, gamma=1.0, colsample_bytree=0.9, colsample_bylevel=0.4, total=   0.1s
[CV] subsample=0.7, silent=False, reg_lambda=50.0, n_estimators=100, min_child_weight=5.0, max_depth=6, learning_rate=0.2, gamma=1.0, colsample_bytree=0.9, colsample_bylevel=0.4 
[CV]  subsample=0.7, silent=False, reg_lambda=50.0, n_estimators=100, min_child_weight=5.0, max_depth=6, learning_rate=0.2, gamma=1.0, colsample_bytree=0.9, colsample_bylevel=0.4, total=   0.1s
[CV] subsample=0.6, silent=False, reg_lambda=1.0, n_estimators=100, min_child_weight=1.0, max_depth=20, learning_rate=0.1, gamma=0.25, colsample_bytree=0.8, colsample_bylevel=0.8 
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.1s remaining:    0.0s
[CV]  subsample=0.6, silent=False, reg_lambda=1.0, n_estimators=100, min_child_weight=1.0, max_depth=20, learning_rate=0.1, gamma=0.25, colsample_bytree=0.8, colsample_bylevel=0.8, total=   0.2s
[CV] subsample=0.6, silent=False, reg_lambda=1.0, n_estimators=100, min_child_weight=1.0, max_depth=20, learning_rate=0.1, gamma=0.25, colsample_bytree=0.8, colsample_bylevel=0.8 
[CV]  subsample=0.6, silent=False, reg_lambda=1.0, n_estimators=100, min_child_weight=1.0, max_depth=20, learning_rate=0.1, gamma=0.25, colsample_bytree=0.8, colsample_bylevel=0.8, total=   0.2s
[CV] subsample=0.9, silent=False, reg_lambda=50.0, n_estimators=100, min_child_weight=7.0, max_depth=15, learning_rate=0.1, gamma=0.25, colsample_bytree=1.0, colsample_bylevel=0.4 
[CV]  subsample=0.9, silent=False, reg_lambda=50.0, n_estimators=100, min_child_weight=7.0, max_depth=15, learning_rate=0.1, gamma=0.25, colsample_bytree=1.0, colsample_bylevel=0.4, total=   0.1s
[CV] subsample=0.9, silent=False, reg_lambda=50.0, n_estimators=100, min_child_weight=7.0, max_depth=15, learning_rate=0.1, gamma=0.25, colsample_bytree=1.0, colsample_bylevel=0.4 
[CV]  subsample=0.9, silent=False, reg_lambda=50.0, n_estimators=100, min_child_weight=7.0, max_depth=15, learning_rate=0.1, gamma=0.25, colsample_bytree=1.0, colsample_bylevel=0.4, total=   0.1s
[CV] subsample=0.5, silent=False, reg_lambda=1.0, n_estimators=100, min_child_weight=7.0, max_depth=10, learning_rate=0.2, gamma=0.5, colsample_bytree=0.7, colsample_bylevel=0.6 
[CV]  subsample=0.5, silent=False, reg_lambda=1.0, n_estimators=100, min_child_weight=7.0, max_depth=10, learning_rate=0.2, gamma=0.5, colsample_bytree=0.7, colsample_bylevel=0.6, total=   0.1s
[CV] subsample=0.5, silent=False, reg_lambda=1.0, n_estimators=100, min_child_weight=7.0, max_depth=10, learning_rate=0.2, gamma=0.5, colsample_bytree=0.7, colsample_bylevel=0.6 
[CV]  subsample=0.5, silent=False, reg_lambda=1.0, n_estimators=100, min_child_weight=7.0, max_depth=10, learning_rate=0.2, gamma=0.5, colsample_bytree=0.7, colsample_bylevel=0.6, total=   0.1s
[CV] subsample=0.7, silent=False, reg_lambda=5.0, n_estimators=100, min_child_weight=10.0, max_depth=20, learning_rate=3, gamma=0.25, colsample_bytree=0.9, colsample_bylevel=0.5 
[CV]  subsample=0.7, silent=False, reg_lambda=5.0, n_estimators=100, min_child_weight=10.0, max_depth=20, learning_rate=3, gamma=0.25, colsample_bytree=0.9, colsample_bylevel=0.5, total=   0.0s
[CV] subsample=0.7, silent=False, reg_lambda=5.0, n_estimators=100, min_child_weight=10.0, max_depth=20, learning_rate=3, gamma=0.25, colsample_bytree=0.9, colsample_bylevel=0.5 
[CV]  subsample=0.7, silent=False, reg_lambda=5.0, n_estimators=100, min_child_weight=10.0, max_depth=20, learning_rate=3, gamma=0.25, colsample_bytree=0.9, colsample_bylevel=0.5, total=   0.0s
[CV] subsample=0.6, silent=False, reg_lambda=10.0, n_estimators=100, min_child_weight=1.0, max_depth=15, learning_rate=0.01, gamma=0, colsample_bytree=0.7, colsample_bylevel=0.8 
[CV]  subsample=0.6, silent=False, reg_lambda=10.0, n_estimators=100, min_child_weight=1.0, max_depth=15, learning_rate=0.01, gamma=0, colsample_bytree=0.7, colsample_bylevel=0.8, total=   0.1s
[CV] subsample=0.6, silent=False, reg_lambda=10.0, n_estimators=100, min_child_weight=1.0, max_depth=15, learning_rate=0.01, gamma=0, colsample_bytree=0.7, colsample_bylevel=0.8 
[CV]  subsample=0.6, silent=False, reg_lambda=10.0, n_estimators=100, min_child_weight=1.0, max_depth=15, learning_rate=0.01, gamma=0, colsample_bytree=0.7, colsample_bylevel=0.8, total=   0.2s
[CV] subsample=0.5, silent=False, reg_lambda=0.1, n_estimators=100, min_child_weight=5.0, max_depth=15, learning_rate=0.01, gamma=0.25, colsample_bytree=0.9, colsample_bylevel=0.4 
[CV]  subsample=0.5, silent=False, reg_lambda=0.1, n_estimators=100, min_child_weight=5.0, max_depth=15, learning_rate=0.01, gamma=0.25, colsample_bytree=0.9, colsample_bylevel=0.4, total=   0.1s
[CV] subsample=0.5, silent=False, reg_lambda=0.1, n_estimators=100, min_child_weight=5.0, max_depth=15, learning_rate=0.01, gamma=0.25, colsample_bytree=0.9, colsample_bylevel=0.4 
[CV]  subsample=0.5, silent=False, reg_lambda=0.1, n_estimators=100, min_child_weight=5.0, max_depth=15, learning_rate=0.01, gamma=0.25, colsample_bytree=0.9, colsample_bylevel=0.4, total=   0.1s
[CV] subsample=0.5, silent=False, reg_lambda=5.0, n_estimators=100, min_child_weight=7.0, max_depth=10, learning_rate=0.01, gamma=0.25, colsample_bytree=0.4, colsample_bylevel=0.9 
[CV]  subsample=0.5, silent=False, reg_lambda=5.0, n_estimators=100, min_child_weight=7.0, max_depth=10, learning_rate=0.01, gamma=0.25, colsample_bytree=0.4, colsample_bylevel=0.9, total=   0.1s
[CV] subsample=0.5, silent=False, reg_lambda=5.0, n_estimators=100, min_child_weight=7.0, max_depth=10, learning_rate=0.01, gamma=0.25, colsample_bytree=0.4, colsample_bylevel=0.9 
[CV]  subsample=0.5, silent=False, reg_lambda=5.0, n_estimators=100, min_child_weight=7.0, max_depth=10, learning_rate=0.01, gamma=0.25, colsample_bytree=0.4, colsample_bylevel=0.9, total=   0.1s
[CV] subsample=0.9, silent=False, reg_lambda=10.0, n_estimators=100, min_child_weight=0.5, max_depth=15, learning_rate=0.2, gamma=0.5, colsample_bytree=0.6, colsample_bylevel=0.4 
[CV]  subsample=0.9, silent=False, reg_lambda=10.0, n_estimators=100, min_child_weight=0.5, max_depth=15, learning_rate=0.2, gamma=0.5, colsample_bytree=0.6, colsample_bylevel=0.4, total=   0.1s
[CV] subsample=0.9, silent=False, reg_lambda=10.0, n_estimators=100, min_child_weight=0.5, max_depth=15, learning_rate=0.2, gamma=0.5, colsample_bytree=0.6, colsample_bylevel=0.4 
[CV]  subsample=0.9, silent=False, reg_lambda=10.0, n_estimators=100, min_child_weight=0.5, max_depth=15, learning_rate=0.2, gamma=0.5, colsample_bytree=0.6, colsample_bylevel=0.4, total=   0.1s
[CV] subsample=0.8, silent=False, reg_lambda=100.0, n_estimators=100, min_child_weight=7.0, max_depth=20, learning_rate=0.1, gamma=0.5, colsample_bytree=1.0, colsample_bylevel=0.4 
[CV]  subsample=0.8, silent=False, reg_lambda=100.0, n_estimators=100, min_child_weight=7.0, max_depth=20, learning_rate=0.1, gamma=0.5, colsample_bytree=1.0, colsample_bylevel=0.4, total=   0.1s
[CV] subsample=0.8, silent=False, reg_lambda=100.0, n_estimators=100, min_child_weight=7.0, max_depth=20, learning_rate=0.1, gamma=0.5, colsample_bytree=1.0, colsample_bylevel=0.4 
[CV]  subsample=0.8, silent=False, reg_lambda=100.0, n_estimators=100, min_child_weight=7.0, max_depth=20, learning_rate=0.1, gamma=0.5, colsample_bytree=1.0, colsample_bylevel=0.4, total=   0.1s
[CV] subsample=0.7, silent=False, reg_lambda=50.0, n_estimators=100, min_child_weight=0.5, max_depth=10, learning_rate=3, gamma=0, colsample_bytree=1.0, colsample_bylevel=1.0 
[CV]  subsample=0.7, silent=False, reg_lambda=50.0, n_estimators=100, min_child_weight=0.5, max_depth=10, learning_rate=3, gamma=0, colsample_bytree=1.0, colsample_bylevel=1.0, total=   0.2s
[CV] subsample=0.7, silent=False, reg_lambda=50.0, n_estimators=100, min_child_weight=0.5, max_depth=10, learning_rate=3, gamma=0, colsample_bytree=1.0, colsample_bylevel=1.0 
[CV]  subsample=0.7, silent=False, reg_lambda=50.0, n_estimators=100, min_child_weight=0.5, max_depth=10, learning_rate=3, gamma=0, colsample_bytree=1.0, colsample_bylevel=1.0, total=   0.2s
[CV] subsample=0.5, silent=False, reg_lambda=1.0, n_estimators=100, min_child_weight=0.5, max_depth=15, learning_rate=0.2, gamma=0, colsample_bytree=0.8, colsample_bylevel=0.7 
[CV]  subsample=0.5, silent=False, reg_lambda=1.0, n_estimators=100, min_child_weight=0.5, max_depth=15, learning_rate=0.2, gamma=0, colsample_bytree=0.8, colsample_bylevel=0.7, total=   0.2s
[CV] subsample=0.5, silent=False, reg_lambda=1.0, n_estimators=100, min_child_weight=0.5, max_depth=15, learning_rate=0.2, gamma=0, colsample_bytree=0.8, colsample_bylevel=0.7 
[CV]  subsample=0.5, silent=False, reg_lambda=1.0, n_estimators=100, min_child_weight=0.5, max_depth=15, learning_rate=0.2, gamma=0, colsample_bytree=0.8, colsample_bylevel=0.7, total=   0.2s
[CV] subsample=0.7, silent=False, reg_lambda=5.0, n_estimators=100, min_child_weight=10.0, max_depth=20, learning_rate=0.001, gamma=0, colsample_bytree=0.8, colsample_bylevel=1.0 
[CV]  subsample=0.7, silent=False, reg_lambda=5.0, n_estimators=100, min_child_weight=10.0, max_depth=20, learning_rate=0.001, gamma=0, colsample_bytree=0.8, colsample_bylevel=1.0, total=   0.1s
[CV] subsample=0.7, silent=False, reg_lambda=5.0, n_estimators=100, min_child_weight=10.0, max_depth=20, learning_rate=0.001, gamma=0, colsample_bytree=0.8, colsample_bylevel=1.0 
[CV]  subsample=0.7, silent=False, reg_lambda=5.0, n_estimators=100, min_child_weight=10.0, max_depth=20, learning_rate=0.001, gamma=0, colsample_bytree=0.8, colsample_bylevel=1.0, total=   0.1s
[CV] subsample=0.5, silent=False, reg_lambda=10.0, n_estimators=100, min_child_weight=0.5, max_depth=10, learning_rate=3, gamma=0, colsample_bytree=0.8, colsample_bylevel=0.4 
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py:2295: RuntimeWarning: divide by zero encountered in log
  loss = -(transformed_labels * np.log(y_pred)).sum(axis=1)
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py:2295: RuntimeWarning: invalid value encountered in multiply
  loss = -(transformed_labels * np.log(y_pred)).sum(axis=1)
[CV]  subsample=0.5, silent=False, reg_lambda=10.0, n_estimators=100, min_child_weight=0.5, max_depth=10, learning_rate=3, gamma=0, colsample_bytree=0.8, colsample_bylevel=0.4, total=   0.1s
[CV] subsample=0.5, silent=False, reg_lambda=10.0, n_estimators=100, min_child_weight=0.5, max_depth=10, learning_rate=3, gamma=0, colsample_bytree=0.8, colsample_bylevel=0.4 
[CV]  subsample=0.5, silent=False, reg_lambda=10.0, n_estimators=100, min_child_weight=0.5, max_depth=10, learning_rate=3, gamma=0, colsample_bytree=0.8, colsample_bylevel=0.4, total=   0.1s
[CV] subsample=1.0, silent=False, reg_lambda=10.0, n_estimators=100, min_child_weight=7.0, max_depth=15, learning_rate=0.1, gamma=0.25, colsample_bytree=0.5, colsample_bylevel=1.0 
[CV]  subsample=1.0, silent=False, reg_lambda=10.0, n_estimators=100, min_child_weight=7.0, max_depth=15, learning_rate=0.1, gamma=0.25, colsample_bytree=0.5, colsample_bylevel=1.0, total=   0.1s
[CV] subsample=1.0, silent=False, reg_lambda=10.0, n_estimators=100, min_child_weight=7.0, max_depth=15, learning_rate=0.1, gamma=0.25, colsample_bytree=0.5, colsample_bylevel=1.0 
[CV]  subsample=1.0, silent=False, reg_lambda=10.0, n_estimators=100, min_child_weight=7.0, max_depth=15, learning_rate=0.1, gamma=0.25, colsample_bytree=0.5, colsample_bylevel=1.0, total=   0.1s
[CV] subsample=0.5, silent=False, reg_lambda=5.0, n_estimators=100, min_child_weight=7.0, max_depth=10, learning_rate=0.001, gamma=0, colsample_bytree=0.6, colsample_bylevel=1.0 
[CV]  subsample=0.5, silent=False, reg_lambda=5.0, n_estimators=100, min_child_weight=7.0, max_depth=10, learning_rate=0.001, gamma=0, colsample_bytree=0.6, colsample_bylevel=1.0, total=   0.1s
[CV] subsample=0.5, silent=False, reg_lambda=5.0, n_estimators=100, min_child_weight=7.0, max_depth=10, learning_rate=0.001, gamma=0, colsample_bytree=0.6, colsample_bylevel=1.0 
[CV]  subsample=0.5, silent=False, reg_lambda=5.0, n_estimators=100, min_child_weight=7.0, max_depth=10, learning_rate=0.001, gamma=0, colsample_bytree=0.6, colsample_bylevel=1.0, total=   0.1s
[CV] subsample=0.8, silent=False, reg_lambda=50.0, n_estimators=100, min_child_weight=0.5, max_depth=20, learning_rate=0, gamma=0.25, colsample_bytree=0.5, colsample_bylevel=0.5 
[CV]  subsample=0.8, silent=False, reg_lambda=50.0, n_estimators=100, min_child_weight=0.5, max_depth=20, learning_rate=0, gamma=0.25, colsample_bytree=0.5, colsample_bylevel=0.5, total=   0.1s
[CV] subsample=0.8, silent=False, reg_lambda=50.0, n_estimators=100, min_child_weight=0.5, max_depth=20, learning_rate=0, gamma=0.25, colsample_bytree=0.5, colsample_bylevel=0.5 
[CV]  subsample=0.8, silent=False, reg_lambda=50.0, n_estimators=100, min_child_weight=0.5, max_depth=20, learning_rate=0, gamma=0.25, colsample_bytree=0.5, colsample_bylevel=0.5, total=   0.1s
[CV] subsample=0.6, silent=False, reg_lambda=100.0, n_estimators=100, min_child_weight=7.0, max_depth=10, learning_rate=3, gamma=0, colsample_bytree=0.6, colsample_bylevel=0.5 
[CV]  subsample=0.6, silent=False, reg_lambda=100.0, n_estimators=100, min_child_weight=7.0, max_depth=10, learning_rate=3, gamma=0, colsample_bytree=0.6, colsample_bylevel=0.5, total=   0.1s
[CV] subsample=0.6, silent=False, reg_lambda=100.0, n_estimators=100, min_child_weight=7.0, max_depth=10, learning_rate=3, gamma=0, colsample_bytree=0.6, colsample_bylevel=0.5 
[CV]  subsample=0.6, silent=False, reg_lambda=100.0, n_estimators=100, min_child_weight=7.0, max_depth=10, learning_rate=3, gamma=0, colsample_bytree=0.6, colsample_bylevel=0.5, total=   0.1s
[CV] subsample=0.6, silent=False, reg_lambda=100.0, n_estimators=100, min_child_weight=7.0, max_depth=20, learning_rate=0.2, gamma=0.25, colsample_bytree=1.0, colsample_bylevel=0.5 
[CV]  subsample=0.6, silent=False, reg_lambda=100.0, n_estimators=100, min_child_weight=7.0, max_depth=20, learning_rate=0.2, gamma=0.25, colsample_bytree=1.0, colsample_bylevel=0.5, total=   0.1s
[CV] subsample=0.6, silent=False, reg_lambda=100.0, n_estimators=100, min_child_weight=7.0, max_depth=20, learning_rate=0.2, gamma=0.25, colsample_bytree=1.0, colsample_bylevel=0.5 
[CV]  subsample=0.6, silent=False, reg_lambda=100.0, n_estimators=100, min_child_weight=7.0, max_depth=20, learning_rate=0.2, gamma=0.25, colsample_bytree=1.0, colsample_bylevel=0.5, total=   0.1s
[CV] subsample=0.7, silent=False, reg_lambda=5.0, n_estimators=100, min_child_weight=1.0, max_depth=15, learning_rate=3, gamma=1.0, colsample_bytree=0.5, colsample_bylevel=0.4 
[CV]  subsample=0.7, silent=False, reg_lambda=5.0, n_estimators=100, min_child_weight=1.0, max_depth=15, learning_rate=3, gamma=1.0, colsample_bytree=0.5, colsample_bylevel=0.4, total=   0.1s
[CV] subsample=0.7, silent=False, reg_lambda=5.0, n_estimators=100, min_child_weight=1.0, max_depth=15, learning_rate=3, gamma=1.0, colsample_bytree=0.5, colsample_bylevel=0.4 
[CV]  subsample=0.7, silent=False, reg_lambda=5.0, n_estimators=100, min_child_weight=1.0, max_depth=15, learning_rate=3, gamma=1.0, colsample_bytree=0.5, colsample_bylevel=0.4, total=   0.1s
Best score: -0.21532398053790774
Best params: 
colsample_bylevel: 0.8
colsample_bytree: 0.8
gamma: 0.25
learning_rate: 0.1
max_depth: 20
min_child_weight: 1.0
n_estimators: 100
reg_lambda: 1.0
silent: False
subsample: 0.6
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py:2295: RuntimeWarning: divide by zero encountered in log
  loss = -(transformed_labels * np.log(y_pred)).sum(axis=1)
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py:2295: RuntimeWarning: invalid value encountered in multiply
  loss = -(transformed_labels * np.log(y_pred)).sum(axis=1)
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py:2295: RuntimeWarning: divide by zero encountered in log
  loss = -(transformed_labels * np.log(y_pred)).sum(axis=1)
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py:2295: RuntimeWarning: invalid value encountered in multiply
  loss = -(transformed_labels * np.log(y_pred)).sum(axis=1)
[Parallel(n_jobs=1)]: Done  40 out of  40 | elapsed:    4.4s finished
X_train
Credit_History Gender_Male Married_Yes Dependents_1 Dependents_2 Dependents_3+ Education_Not Graduate Self_Employed_Yes Property_Area_Semiurban Property_Area_Urban Wallet_Size EMI Wallet_Share LoanAmount Loan_Amount_Term mean_LoanAmount_per_Credit_History max_LoanAmount_per_Married max_LoanAmount_per_bed max_LoanAmount_per_department max_LoanAmount_per_Self_Employed max_LoanAmount_per_Loan_Amount_Term max_LoanAmount_per_Credit_History max_LoanAmount_per_Property_Area min_LoanAmount_per_Gender min_LoanAmount_per_Married min_LoanAmount_per_bed min_LoanAmount_per_department min_LoanAmount_per_Self_Employed min_LoanAmount_per_Loan_Amount_Term min_LoanAmount_per_Credit_History min_LoanAmount_per_Property_Area
Loan_ID
LP001241 0 0 0 0 0 0 0 0 1 0 -0.450451 -0.243584 0.156830 -0.205928 0.271763 1.842779 -1.273231 0.495584 0.571878 0.472428 0.095731 -1.842779 -0.438870 -1.934315 -1.273231 -0.826202 -0.571878 -0.472428 -0.347567 1.842779 0.535577
LP001151 1 0 0 0 0 0 0 0 1 0 -0.174504 -0.198331 -0.190640 -0.119839 0.271763 -0.542659 -1.273231 0.495584 0.571878 0.472428 0.095731 0.542659 -0.438870 -1.934315 -1.273231 -0.826202 -0.571878 -0.472428 -0.347567 -0.542659 0.535577
LP002328 0 1 1 0 0 0 1 0 0 0 -0.199514 0.225403 0.321951 0.676487 0.271763 1.842779 0.785403 0.495584 -1.748626 0.472428 0.095731 -1.842779 -0.979345 0.516979 0.785403 -0.826202 1.748626 -0.472428 -0.347567 1.842779 0.895089
LP001883 1 0 0 0 0 0 0 1 0 0 -0.573684 -0.249755 0.469841 -0.216689 0.271763 -0.542659 -1.273231 0.495584 0.571878 -2.116724 0.095731 0.542659 -0.979345 -1.934315 -1.273231 -0.826202 -0.571878 2.116724 -0.347567 -0.542659 0.895089
LP002341 1 0 0 1 0 0 0 0 0 1 -0.687975 -0.107825 1.347132 0.052339 0.271763 -0.542659 -1.273231 -0.237456 0.571878 0.472428 0.095731 0.542659 1.362716 -1.934315 -1.273231 1.393713 -0.571878 -0.472428 -0.347567 -0.542659 -1.381823
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
LP002367 1 0 0 1 0 0 1 0 0 0 -0.407697 -0.558300 -0.404579 -0.797792 0.271763 -0.542659 -1.273231 -0.237456 -1.748626 0.472428 0.095731 0.542659 -0.979345 -1.934315 -1.273231 1.393713 1.748626 -0.472428 -0.347567 -0.542659 0.895089
LP002788 0 1 1 0 0 0 1 0 0 1 -0.382408 0.013536 0.403793 0.278324 0.271763 1.842779 0.785403 0.495584 -1.748626 0.472428 0.095731 -1.842779 1.362716 0.516979 0.785403 -0.826202 1.748626 -0.472428 -0.347567 1.842779 -1.381823
LP001029 1 1 0 0 0 0 0 0 0 0 -0.395541 -0.369059 -0.136079 -0.442674 0.271763 -0.542659 -1.273231 0.495584 0.571878 0.472428 0.095731 0.542659 -0.979345 0.516979 -1.273231 -0.826202 -0.571878 -0.472428 -0.347567 -0.542659 0.895089
LP002949 1 0 0 0 0 1 0 1 0 1 4.828584 2.977619 -0.441910 2.096959 -2.612162 -0.542659 -1.273231 1.228624 0.571878 -2.116724 0.095731 0.542659 1.362716 -1.934315 -1.273231 1.654879 -0.571878 2.116724 0.887986 -0.542659 -1.381823
LP001014 0 1 1 0 0 1 0 0 1 0 -0.277199 -0.118110 0.031913 0.030817 0.271763 1.842779 0.785403 1.228624 0.571878 0.472428 0.095731 -1.842779 -0.438870 0.516979 0.785403 1.654879 -0.571878 -0.472428 -0.347567 1.842779 0.535577

1521 rows × 31 columns

model = xgb.XGBClassifier(max_depth = 15,eval_metric = "auc",min_child_weight = 0.5,subsample = 0.8,colsample_bytree = 0.6,colsample_bylevel=0.4,n_estimators=100,
                          gamma=.8,reg_lambda = 10.0,silent = False,learning_rate = .02)

eval_set = [(X_train,y_train),(X_test,y_test)]
model.fit(X_train,y_train.values.ravel(),early_stopping_rounds= 500,eval_metric= ['auc','error'],eval_set = eval_set,verbose =500)
eval_score = accuracy_score(y_test, model.predict(X_test))

print('Eval ACC: {}'.format(eval_score))
[0]	validation_0-auc:0.892482	validation_0-error:0.108481	validation_1-auc:0.889795	validation_1-error:0.125984
Multiple eval metrics have been passed: 'validation_1-error' will be used for early stopping.

Will train until validation_1-error hasn't improved in 500 rounds.
[99]	validation_0-auc:0.969725	validation_0-error:0.080868	validation_1-auc:0.961251	validation_1-error:0.102362
Eval ACC: 0.9028871391076115
params = {}
params['learning_rate'] = 0.01
params['max_depth'] = 6
params['n_estimators'] = 500
params['objective'] = 'binary'
params['boosting_type'] = 'gbdt'
params['subsample'] = 0.9
params['random_state'] = 42
params['colsample_bytree']=0.9
params['min_data_in_leaf'] = 62
params['reg_alpha'] = 0.7
params['reg_lambda'] = 1.11
import lightgbm as lgb
clf = lgb.LGBMClassifier(**params)
    
clf.fit(X_train, y_train, early_stopping_rounds=500, eval_set=[(X_train, y_train), (X_test, y_test)], eval_metric=['accuracy','auc'], verbose=True)

preds = clf.predict(X_test)
eval_score = accuracy_score(y_test, clf.predict(X_test))
prediction_test = clf.predict(test)
print('Eval ACC: {}'.format(eval_score))
[1]	training's auc: 0.885387	training's binary_logloss: 0.52203	valid_1's auc: 0.870267	valid_1's binary_logloss: 0.540682
Training until validation scores don't improve for 500 rounds.
[2]	training's auc: 0.895781	training's binary_logloss: 0.518815	valid_1's auc: 0.885428	valid_1's binary_logloss: 0.537365
[3]	training's auc: 0.898184	training's binary_logloss: 0.515287	valid_1's auc: 0.894701	valid_1's binary_logloss: 0.533656
[4]	training's auc: 0.898484	training's binary_logloss: 0.51185	valid_1's auc: 0.896664	valid_1's binary_logloss: 0.530042
[5]	training's auc: 0.897766	training's binary_logloss: 0.508501	valid_1's auc: 0.896933	valid_1's binary_logloss: 0.52652
[6]	training's auc: 0.8976	training's binary_logloss: 0.505305	valid_1's auc: 0.898434	valid_1's binary_logloss: 0.523316
[7]	training's auc: 0.8976	training's binary_logloss: 0.502119	valid_1's auc: 0.899434	valid_1's binary_logloss: 0.519964
[8]	training's auc: 0.903661	training's binary_logloss: 0.499074	valid_1's auc: 0.906611	valid_1's binary_logloss: 0.5163
[9]	training's auc: 0.903904	training's binary_logloss: 0.496015	valid_1's auc: 0.906572	valid_1's binary_logloss: 0.513095
[10]	training's auc: 0.903863	training's binary_logloss: 0.493039	valid_1's auc: 0.906399	valid_1's binary_logloss: 0.509955
[11]	training's auc: 0.903911	training's binary_logloss: 0.490121	valid_1's auc: 0.906476	valid_1's binary_logloss: 0.506898
[12]	training's auc: 0.90423	training's binary_logloss: 0.48728	valid_1's auc: 0.906399	valid_1's binary_logloss: 0.503901
[13]	training's auc: 0.907577	training's binary_logloss: 0.484422	valid_1's auc: 0.91042	valid_1's binary_logloss: 0.500949
[14]	training's auc: 0.90829	training's binary_logloss: 0.481629	valid_1's auc: 0.910035	valid_1's binary_logloss: 0.49807
[15]	training's auc: 0.910284	training's binary_logloss: 0.478899	valid_1's auc: 0.909304	valid_1's binary_logloss: 0.495262
[16]	training's auc: 0.910112	training's binary_logloss: 0.476308	valid_1's auc: 0.908727	valid_1's binary_logloss: 0.492583
[17]	training's auc: 0.90943	training's binary_logloss: 0.473718	valid_1's auc: 0.908304	valid_1's binary_logloss: 0.489863
[18]	training's auc: 0.910271	training's binary_logloss: 0.471138	valid_1's auc: 0.909343	valid_1's binary_logloss: 0.487214
[19]	training's auc: 0.910232	training's binary_logloss: 0.468711	valid_1's auc: 0.909843	valid_1's binary_logloss: 0.484486
[20]	training's auc: 0.911439	training's binary_logloss: 0.466235	valid_1's auc: 0.910959	valid_1's binary_logloss: 0.48189
[21]	training's auc: 0.91348	training's binary_logloss: 0.463638	valid_1's auc: 0.914499	valid_1's binary_logloss: 0.479217
[22]	training's auc: 0.914698	training's binary_logloss: 0.461259	valid_1's auc: 0.914576	valid_1's binary_logloss: 0.47664
[23]	training's auc: 0.914088	training's binary_logloss: 0.458924	valid_1's auc: 0.914807	valid_1's binary_logloss: 0.474186
[24]	training's auc: 0.914604	training's binary_logloss: 0.456631	valid_1's auc: 0.914191	valid_1's binary_logloss: 0.471782
[25]	training's auc: 0.914146	training's binary_logloss: 0.454378	valid_1's auc: 0.91396	valid_1's binary_logloss: 0.469414
[26]	training's auc: 0.913504	training's binary_logloss: 0.45222	valid_1's auc: 0.913268	valid_1's binary_logloss: 0.467254
[27]	training's auc: 0.914141	training's binary_logloss: 0.450039	valid_1's auc: 0.914191	valid_1's binary_logloss: 0.46497
[28]	training's auc: 0.914541	training's binary_logloss: 0.447904	valid_1's auc: 0.914922	valid_1's binary_logloss: 0.462735
[29]	training's auc: 0.914308	training's binary_logloss: 0.445801	valid_1's auc: 0.914691	valid_1's binary_logloss: 0.460525
[30]	training's auc: 0.914905	training's binary_logloss: 0.443732	valid_1's auc: 0.915249	valid_1's binary_logloss: 0.458216
[31]	training's auc: 0.914862	training's binary_logloss: 0.441792	valid_1's auc: 0.915403	valid_1's binary_logloss: 0.456143
[32]	training's auc: 0.91565	training's binary_logloss: 0.439842	valid_1's auc: 0.91398	valid_1's binary_logloss: 0.454091
[33]	training's auc: 0.915398	training's binary_logloss: 0.43791	valid_1's auc: 0.914133	valid_1's binary_logloss: 0.452055
[34]	training's auc: 0.915908	training's binary_logloss: 0.435976	valid_1's auc: 0.913133	valid_1's binary_logloss: 0.449978
[35]	training's auc: 0.915671	training's binary_logloss: 0.434111	valid_1's auc: 0.91448	valid_1's binary_logloss: 0.447886
[36]	training's auc: 0.916531	training's binary_logloss: 0.432232	valid_1's auc: 0.914076	valid_1's binary_logloss: 0.445984
[37]	training's auc: 0.91615	training's binary_logloss: 0.430474	valid_1's auc: 0.914191	valid_1's binary_logloss: 0.444132
[38]	training's auc: 0.916782	training's binary_logloss: 0.428653	valid_1's auc: 0.914422	valid_1's binary_logloss: 0.44218
[39]	training's auc: 0.9164	training's binary_logloss: 0.426907	valid_1's auc: 0.914384	valid_1's binary_logloss: 0.440228
[40]	training's auc: 0.916362	training's binary_logloss: 0.425126	valid_1's auc: 0.914845	valid_1's binary_logloss: 0.438352
[41]	training's auc: 0.916579	training's binary_logloss: 0.423385	valid_1's auc: 0.914942	valid_1's binary_logloss: 0.436406
[42]	training's auc: 0.916486	training's binary_logloss: 0.421717	valid_1's auc: 0.915423	valid_1's binary_logloss: 0.434529
[43]	training's auc: 0.916966	training's binary_logloss: 0.420029	valid_1's auc: 0.915615	valid_1's binary_logloss: 0.432723
[44]	training's auc: 0.916926	training's binary_logloss: 0.418348	valid_1's auc: 0.915461	valid_1's binary_logloss: 0.430959
[45]	training's auc: 0.917133	training's binary_logloss: 0.416747	valid_1's auc: 0.916077	valid_1's binary_logloss: 0.429277
[46]	training's auc: 0.917579	training's binary_logloss: 0.41496	valid_1's auc: 0.915903	valid_1's binary_logloss: 0.427374
[47]	training's auc: 0.917663	training's binary_logloss: 0.413373	valid_1's auc: 0.917212	valid_1's binary_logloss: 0.425744
[48]	training's auc: 0.917637	training's binary_logloss: 0.411845	valid_1's auc: 0.917173	valid_1's binary_logloss: 0.424019
[49]	training's auc: 0.918267	training's binary_logloss: 0.410215	valid_1's auc: 0.917654	valid_1's binary_logloss: 0.422058
[50]	training's auc: 0.918728	training's binary_logloss: 0.408541	valid_1's auc: 0.918039	valid_1's binary_logloss: 0.420353
[51]	training's auc: 0.918892	training's binary_logloss: 0.407042	valid_1's auc: 0.918078	valid_1's binary_logloss: 0.418755
[52]	training's auc: 0.919018	training's binary_logloss: 0.405557	valid_1's auc: 0.918847	valid_1's binary_logloss: 0.416968
[53]	training's auc: 0.919169	training's binary_logloss: 0.404073	valid_1's auc: 0.919232	valid_1's binary_logloss: 0.415413
[54]	training's auc: 0.919177	training's binary_logloss: 0.402599	valid_1's auc: 0.918539	valid_1's binary_logloss: 0.413894
[55]	training's auc: 0.919192	training's binary_logloss: 0.401216	valid_1's auc: 0.918809	valid_1's binary_logloss: 0.412528
[56]	training's auc: 0.919197	training's binary_logloss: 0.399848	valid_1's auc: 0.918847	valid_1's binary_logloss: 0.410988
[57]	training's auc: 0.919339	training's binary_logloss: 0.398468	valid_1's auc: 0.918693	valid_1's binary_logloss: 0.409457
[58]	training's auc: 0.919488	training's binary_logloss: 0.396942	valid_1's auc: 0.919424	valid_1's binary_logloss: 0.407845
[59]	training's auc: 0.919809	training's binary_logloss: 0.39555	valid_1's auc: 0.919424	valid_1's binary_logloss: 0.406461
[60]	training's auc: 0.919814	training's binary_logloss: 0.394258	valid_1's auc: 0.919578	valid_1's binary_logloss: 0.405007
[61]	training's auc: 0.92059	training's binary_logloss: 0.39287	valid_1's auc: 0.920213	valid_1's binary_logloss: 0.403518
[62]	training's auc: 0.920446	training's binary_logloss: 0.391554	valid_1's auc: 0.919905	valid_1's binary_logloss: 0.402139
[63]	training's auc: 0.920901	training's binary_logloss: 0.390239	valid_1's auc: 0.920559	valid_1's binary_logloss: 0.400838
[64]	training's auc: 0.920907	training's binary_logloss: 0.389052	valid_1's auc: 0.920329	valid_1's binary_logloss: 0.399628
[65]	training's auc: 0.920621	training's binary_logloss: 0.387817	valid_1's auc: 0.920098	valid_1's binary_logloss: 0.398409
[66]	training's auc: 0.920672	training's binary_logloss: 0.386474	valid_1's auc: 0.921252	valid_1's binary_logloss: 0.396809
[67]	training's auc: 0.921017	training's binary_logloss: 0.385219	valid_1's auc: 0.92106	valid_1's binary_logloss: 0.395543
[68]	training's auc: 0.921335	training's binary_logloss: 0.383953	valid_1's auc: 0.920675	valid_1's binary_logloss: 0.394257
[69]	training's auc: 0.921343	training's binary_logloss: 0.382811	valid_1's auc: 0.921329	valid_1's binary_logloss: 0.392968
[70]	training's auc: 0.921095	training's binary_logloss: 0.381692	valid_1's auc: 0.92156	valid_1's binary_logloss: 0.391867
[71]	training's auc: 0.921411	training's binary_logloss: 0.38047	valid_1's auc: 0.921521	valid_1's binary_logloss: 0.390565
[72]	training's auc: 0.92199	training's binary_logloss: 0.379263	valid_1's auc: 0.921906	valid_1's binary_logloss: 0.389249
[73]	training's auc: 0.922063	training's binary_logloss: 0.378132	valid_1's auc: 0.92206	valid_1's binary_logloss: 0.38812
[74]	training's auc: 0.922015	training's binary_logloss: 0.376984	valid_1's auc: 0.922176	valid_1's binary_logloss: 0.38697
[75]	training's auc: 0.922147	training's binary_logloss: 0.375934	valid_1's auc: 0.922445	valid_1's binary_logloss: 0.385987
[76]	training's auc: 0.922425	training's binary_logloss: 0.374819	valid_1's auc: 0.922637	valid_1's binary_logloss: 0.384863
[77]	training's auc: 0.922928	training's binary_logloss: 0.373722	valid_1's auc: 0.922137	valid_1's binary_logloss: 0.383758
[78]	training's auc: 0.923148	training's binary_logloss: 0.372635	valid_1's auc: 0.922599	valid_1's binary_logloss: 0.382604
[79]	training's auc: 0.923297	training's binary_logloss: 0.371487	valid_1's auc: 0.923984	valid_1's binary_logloss: 0.381235
[80]	training's auc: 0.923254	training's binary_logloss: 0.370463	valid_1's auc: 0.924061	valid_1's binary_logloss: 0.380241
[81]	training's auc: 0.923502	training's binary_logloss: 0.369424	valid_1's auc: 0.923715	valid_1's binary_logloss: 0.379196
[82]	training's auc: 0.923684	training's binary_logloss: 0.368316	valid_1's auc: 0.924196	valid_1's binary_logloss: 0.377939
[83]	training's auc: 0.923873	training's binary_logloss: 0.367322	valid_1's auc: 0.924465	valid_1's binary_logloss: 0.376974
[84]	training's auc: 0.92395	training's binary_logloss: 0.366384	valid_1's auc: 0.923792	valid_1's binary_logloss: 0.376135
[85]	training's auc: 0.924297	training's binary_logloss: 0.36533	valid_1's auc: 0.923984	valid_1's binary_logloss: 0.375041
[86]	training's auc: 0.924115	training's binary_logloss: 0.364386	valid_1's auc: 0.923946	valid_1's binary_logloss: 0.374127
[87]	training's auc: 0.92474	training's binary_logloss: 0.363206	valid_1's auc: 0.924696	valid_1's binary_logloss: 0.372857
[88]	training's auc: 0.924927	training's binary_logloss: 0.362257	valid_1's auc: 0.924619	valid_1's binary_logloss: 0.371899
[89]	training's auc: 0.925112	training's binary_logloss: 0.361239	valid_1's auc: 0.924581	valid_1's binary_logloss: 0.370738
[90]	training's auc: 0.925044	training's binary_logloss: 0.360311	valid_1's auc: 0.924773	valid_1's binary_logloss: 0.369853
[91]	training's auc: 0.925432	training's binary_logloss: 0.359344	valid_1's auc: 0.925389	valid_1's binary_logloss: 0.368767
[92]	training's auc: 0.925372	training's binary_logloss: 0.358509	valid_1's auc: 0.924927	valid_1's binary_logloss: 0.368029
[93]	training's auc: 0.925401	training's binary_logloss: 0.357617	valid_1's auc: 0.925004	valid_1's binary_logloss: 0.367145
[94]	training's auc: 0.925417	training's binary_logloss: 0.356725	valid_1's auc: 0.924658	valid_1's binary_logloss: 0.366337
[95]	training's auc: 0.925623	training's binary_logloss: 0.355721	valid_1's auc: 0.925562	valid_1's binary_logloss: 0.365217
[96]	training's auc: 0.926042	training's binary_logloss: 0.354772	valid_1's auc: 0.92587	valid_1's binary_logloss: 0.36414
[97]	training's auc: 0.926775	training's binary_logloss: 0.353705	valid_1's auc: 0.927062	valid_1's binary_logloss: 0.362931
[98]	training's auc: 0.926687	training's binary_logloss: 0.35281	valid_1's auc: 0.92764	valid_1's binary_logloss: 0.36192
[99]	training's auc: 0.926866	training's binary_logloss: 0.351966	valid_1's auc: 0.927178	valid_1's binary_logloss: 0.361022
[100]	training's auc: 0.927357	training's binary_logloss: 0.350887	valid_1's auc: 0.928294	valid_1's binary_logloss: 0.359799
[101]	training's auc: 0.927266	training's binary_logloss: 0.350041	valid_1's auc: 0.92814	valid_1's binary_logloss: 0.358979
[102]	training's auc: 0.927687	training's binary_logloss: 0.348978	valid_1's auc: 0.928871	valid_1's binary_logloss: 0.357916
[103]	training's auc: 0.928018	training's binary_logloss: 0.347933	valid_1's auc: 0.928986	valid_1's binary_logloss: 0.356871
[104]	training's auc: 0.928233	training's binary_logloss: 0.34692	valid_1's auc: 0.929256	valid_1's binary_logloss: 0.355807
[105]	training's auc: 0.928488	training's binary_logloss: 0.345933	valid_1's auc: 0.929448	valid_1's binary_logloss: 0.354916
[106]	training's auc: 0.92869	training's binary_logloss: 0.344982	valid_1's auc: 0.930025	valid_1's binary_logloss: 0.353871
[107]	training's auc: 0.928728	training's binary_logloss: 0.343922	valid_1's auc: 0.930372	valid_1's binary_logloss: 0.352749
[108]	training's auc: 0.928988	training's binary_logloss: 0.342971	valid_1's auc: 0.93041	valid_1's binary_logloss: 0.351734
[109]	training's auc: 0.929231	training's binary_logloss: 0.341989	valid_1's auc: 0.930256	valid_1's binary_logloss: 0.350703
[110]	training's auc: 0.929481	training's binary_logloss: 0.341062	valid_1's auc: 0.930795	valid_1's binary_logloss: 0.349714
[111]	training's auc: 0.929592	training's binary_logloss: 0.34005	valid_1's auc: 0.931218	valid_1's binary_logloss: 0.348643
[112]	training's auc: 0.929974	training's binary_logloss: 0.339091	valid_1's auc: 0.931603	valid_1's binary_logloss: 0.347647
[113]	training's auc: 0.930452	training's binary_logloss: 0.338196	valid_1's auc: 0.931795	valid_1's binary_logloss: 0.346692
[114]	training's auc: 0.930705	training's binary_logloss: 0.337371	valid_1's auc: 0.932642	valid_1's binary_logloss: 0.345732
[115]	training's auc: 0.930743	training's binary_logloss: 0.336499	valid_1's auc: 0.932911	valid_1's binary_logloss: 0.344803
[116]	training's auc: 0.931089	training's binary_logloss: 0.335641	valid_1's auc: 0.933027	valid_1's binary_logloss: 0.34389
[117]	training's auc: 0.931385	training's binary_logloss: 0.334871	valid_1's auc: 0.933142	valid_1's binary_logloss: 0.34304
[118]	training's auc: 0.931621	training's binary_logloss: 0.33405	valid_1's auc: 0.933681	valid_1's binary_logloss: 0.342166
[119]	training's auc: 0.931813	training's binary_logloss: 0.333257	valid_1's auc: 0.934489	valid_1's binary_logloss: 0.341273
[120]	training's auc: 0.931859	training's binary_logloss: 0.332516	valid_1's auc: 0.934643	valid_1's binary_logloss: 0.340453
[121]	training's auc: 0.932028	training's binary_logloss: 0.331839	valid_1's auc: 0.934835	valid_1's binary_logloss: 0.339609
[122]	training's auc: 0.932134	training's binary_logloss: 0.330939	valid_1's auc: 0.935336	valid_1's binary_logloss: 0.338656
[123]	training's auc: 0.932309	training's binary_logloss: 0.330185	valid_1's auc: 0.935951	valid_1's binary_logloss: 0.337821
[124]	training's auc: 0.932887	training's binary_logloss: 0.329288	valid_1's auc: 0.936336	valid_1's binary_logloss: 0.336798
[125]	training's auc: 0.9329	training's binary_logloss: 0.328554	valid_1's auc: 0.936259	valid_1's binary_logloss: 0.336054
[126]	training's auc: 0.93333	training's binary_logloss: 0.327631	valid_1's auc: 0.937029	valid_1's binary_logloss: 0.335061
[127]	training's auc: 0.933565	training's binary_logloss: 0.326914	valid_1's auc: 0.937221	valid_1's binary_logloss: 0.334335
[128]	training's auc: 0.933557	training's binary_logloss: 0.326137	valid_1's auc: 0.937644	valid_1's binary_logloss: 0.33347
[129]	training's auc: 0.934017	training's binary_logloss: 0.32533	valid_1's auc: 0.93776	valid_1's binary_logloss: 0.332509
[130]	training's auc: 0.934202	training's binary_logloss: 0.32468	valid_1's auc: 0.937991	valid_1's binary_logloss: 0.331804
[131]	training's auc: 0.934207	training's binary_logloss: 0.323934	valid_1's auc: 0.938183	valid_1's binary_logloss: 0.331001
[132]	training's auc: 0.934316	training's binary_logloss: 0.323322	valid_1's auc: 0.938183	valid_1's binary_logloss: 0.330346
[133]	training's auc: 0.93459	training's binary_logloss: 0.322664	valid_1's auc: 0.938318	valid_1's binary_logloss: 0.329625
[134]	training's auc: 0.934696	training's binary_logloss: 0.321999	valid_1's auc: 0.93901	valid_1's binary_logloss: 0.328957
[135]	training's auc: 0.934853	training's binary_logloss: 0.321298	valid_1's auc: 0.938818	valid_1's binary_logloss: 0.3282
[136]	training's auc: 0.93484	training's binary_logloss: 0.320686	valid_1's auc: 0.939087	valid_1's binary_logloss: 0.327443
[137]	training's auc: 0.935095	training's binary_logloss: 0.31997	valid_1's auc: 0.93928	valid_1's binary_logloss: 0.326648
[138]	training's auc: 0.935234	training's binary_logloss: 0.319305	valid_1's auc: 0.939588	valid_1's binary_logloss: 0.325863
[139]	training's auc: 0.935497	training's binary_logloss: 0.318737	valid_1's auc: 0.939741	valid_1's binary_logloss: 0.32526
[140]	training's auc: 0.935591	training's binary_logloss: 0.318113	valid_1's auc: 0.939972	valid_1's binary_logloss: 0.324636
[141]	training's auc: 0.935836	training's binary_logloss: 0.317393	valid_1's auc: 0.940453	valid_1's binary_logloss: 0.323982
[142]	training's auc: 0.935879	training's binary_logloss: 0.316805	valid_1's auc: 0.940415	valid_1's binary_logloss: 0.32325
[143]	training's auc: 0.936109	training's binary_logloss: 0.316136	valid_1's auc: 0.940992	valid_1's binary_logloss: 0.322528
[144]	training's auc: 0.93623	training's binary_logloss: 0.315599	valid_1's auc: 0.940684	valid_1's binary_logloss: 0.32196
[145]	training's auc: 0.936476	training's binary_logloss: 0.314899	valid_1's auc: 0.94103	valid_1's binary_logloss: 0.321276
[146]	training's auc: 0.936637	training's binary_logloss: 0.314396	valid_1's auc: 0.941223	valid_1's binary_logloss: 0.3206
[147]	training's auc: 0.93665	training's binary_logloss: 0.313703	valid_1's auc: 0.941492	valid_1's binary_logloss: 0.319892
[148]	training's auc: 0.936693	training's binary_logloss: 0.31319	valid_1's auc: 0.941184	valid_1's binary_logloss: 0.319352
[149]	training's auc: 0.93686	training's binary_logloss: 0.312571	valid_1's auc: 0.941454	valid_1's binary_logloss: 0.318723
[150]	training's auc: 0.937022	training's binary_logloss: 0.311939	valid_1's auc: 0.941877	valid_1's binary_logloss: 0.318137
[151]	training's auc: 0.937507	training's binary_logloss: 0.311168	valid_1's auc: 0.942416	valid_1's binary_logloss: 0.31739
[152]	training's auc: 0.937484	training's binary_logloss: 0.310485	valid_1's auc: 0.942531	valid_1's binary_logloss: 0.316688
[153]	training's auc: 0.937666	training's binary_logloss: 0.309872	valid_1's auc: 0.942339	valid_1's binary_logloss: 0.316181
[154]	training's auc: 0.937858	training's binary_logloss: 0.309281	valid_1's auc: 0.943031	valid_1's binary_logloss: 0.315512
[155]	training's auc: 0.938025	training's binary_logloss: 0.308534	valid_1's auc: 0.943378	valid_1's binary_logloss: 0.314763
[156]	training's auc: 0.938177	training's binary_logloss: 0.307966	valid_1's auc: 0.943801	valid_1's binary_logloss: 0.314092
[157]	training's auc: 0.938172	training's binary_logloss: 0.307457	valid_1's auc: 0.943839	valid_1's binary_logloss: 0.313452
[158]	training's auc: 0.938498	training's binary_logloss: 0.306729	valid_1's auc: 0.944571	valid_1's binary_logloss: 0.312723
[159]	training's auc: 0.938842	training's binary_logloss: 0.306013	valid_1's auc: 0.944763	valid_1's binary_logloss: 0.312032
[160]	training's auc: 0.939006	training's binary_logloss: 0.305466	valid_1's auc: 0.944917	valid_1's binary_logloss: 0.311384
[161]	training's auc: 0.938973	training's binary_logloss: 0.305034	valid_1's auc: 0.944917	valid_1's binary_logloss: 0.310795
[162]	training's auc: 0.939304	training's binary_logloss: 0.304335	valid_1's auc: 0.945032	valid_1's binary_logloss: 0.310097
[163]	training's auc: 0.939618	training's binary_logloss: 0.303647	valid_1's auc: 0.945263	valid_1's binary_logloss: 0.309435
[164]	training's auc: 0.939597	training's binary_logloss: 0.303171	valid_1's auc: 0.945494	valid_1's binary_logloss: 0.308834
[165]	training's auc: 0.940106	training's binary_logloss: 0.302497	valid_1's auc: 0.945571	valid_1's binary_logloss: 0.308186
[166]	training's auc: 0.940222	training's binary_logloss: 0.301978	valid_1's auc: 0.945802	valid_1's binary_logloss: 0.307569
[167]	training's auc: 0.940386	training's binary_logloss: 0.301429	valid_1's auc: 0.945879	valid_1's binary_logloss: 0.30694
[168]	training's auc: 0.940518	training's binary_logloss: 0.300772	valid_1's auc: 0.945994	valid_1's binary_logloss: 0.306309
[169]	training's auc: 0.940684	training's binary_logloss: 0.300232	valid_1's auc: 0.945879	valid_1's binary_logloss: 0.305875
[170]	training's auc: 0.941033	training's binary_logloss: 0.2996	valid_1's auc: 0.94611	valid_1's binary_logloss: 0.305166
[171]	training's auc: 0.941251	training's binary_logloss: 0.298961	valid_1's auc: 0.946341	valid_1's binary_logloss: 0.304555
[172]	training's auc: 0.941406	training's binary_logloss: 0.298534	valid_1's auc: 0.946495	valid_1's binary_logloss: 0.304131
[173]	training's auc: 0.943761	training's binary_logloss: 0.297951	valid_1's auc: 0.94586	valid_1's binary_logloss: 0.30363
[174]	training's auc: 0.944008	training's binary_logloss: 0.297323	valid_1's auc: 0.946437	valid_1's binary_logloss: 0.30296
[175]	training's auc: 0.944231	training's binary_logloss: 0.296641	valid_1's auc: 0.946591	valid_1's binary_logloss: 0.302286
[176]	training's auc: 0.944405	training's binary_logloss: 0.296147	valid_1's auc: 0.946398	valid_1's binary_logloss: 0.301715
[177]	training's auc: 0.944697	training's binary_logloss: 0.295723	valid_1's auc: 0.94636	valid_1's binary_logloss: 0.301294
[178]	training's auc: 0.944841	training's binary_logloss: 0.295111	valid_1's auc: 0.946706	valid_1's binary_logloss: 0.300684
[179]	training's auc: 0.945041	training's binary_logloss: 0.294519	valid_1's auc: 0.947091	valid_1's binary_logloss: 0.300128
[180]	training's auc: 0.944998	training's binary_logloss: 0.294167	valid_1's auc: 0.947129	valid_1's binary_logloss: 0.299667
[181]	training's auc: 0.945092	training's binary_logloss: 0.293703	valid_1's auc: 0.947591	valid_1's binary_logloss: 0.299121
[182]	training's auc: 0.94522	training's binary_logloss: 0.293119	valid_1's auc: 0.947591	valid_1's binary_logloss: 0.29863
[183]	training's auc: 0.945428	training's binary_logloss: 0.292666	valid_1's auc: 0.947861	valid_1's binary_logloss: 0.298096
[184]	training's auc: 0.945516	training's binary_logloss: 0.292031	valid_1's auc: 0.947976	valid_1's binary_logloss: 0.297465
[185]	training's auc: 0.945648	training's binary_logloss: 0.291642	valid_1's auc: 0.94813	valid_1's binary_logloss: 0.297061
[186]	training's auc: 0.945734	training's binary_logloss: 0.291083	valid_1's auc: 0.948322	valid_1's binary_logloss: 0.296495
[187]	training's auc: 0.946004	training's binary_logloss: 0.290468	valid_1's auc: 0.948245	valid_1's binary_logloss: 0.295884
[188]	training's auc: 0.946029	training's binary_logloss: 0.29002	valid_1's auc: 0.948399	valid_1's binary_logloss: 0.295366
[189]	training's auc: 0.946163	training's binary_logloss: 0.289465	valid_1's auc: 0.948361	valid_1's binary_logloss: 0.294814
[190]	training's auc: 0.946282	training's binary_logloss: 0.289038	valid_1's auc: 0.948553	valid_1's binary_logloss: 0.294309
[191]	training's auc: 0.94634	training's binary_logloss: 0.288556	valid_1's auc: 0.948438	valid_1's binary_logloss: 0.293929
[192]	training's auc: 0.946634	training's binary_logloss: 0.288016	valid_1's auc: 0.948707	valid_1's binary_logloss: 0.293391
[193]	training's auc: 0.94658	training's binary_logloss: 0.287588	valid_1's auc: 0.948669	valid_1's binary_logloss: 0.292896
[194]	training's auc: 0.946719	training's binary_logloss: 0.287175	valid_1's auc: 0.948592	valid_1's binary_logloss: 0.292489
[195]	training's auc: 0.946853	training's binary_logloss: 0.286694	valid_1's auc: 0.948707	valid_1's binary_logloss: 0.292101
[196]	training's auc: 0.947038	training's binary_logloss: 0.286215	valid_1's auc: 0.948938	valid_1's binary_logloss: 0.291682
[197]	training's auc: 0.947196	training's binary_logloss: 0.285726	valid_1's auc: 0.949169	valid_1's binary_logloss: 0.291287
[198]	training's auc: 0.947259	training's binary_logloss: 0.285269	valid_1's auc: 0.949092	valid_1's binary_logloss: 0.290916
[199]	training's auc: 0.94726	training's binary_logloss: 0.284944	valid_1's auc: 0.949092	valid_1's binary_logloss: 0.290519
[200]	training's auc: 0.947291	training's binary_logloss: 0.284495	valid_1's auc: 0.94913	valid_1's binary_logloss: 0.290169
[201]	training's auc: 0.947541	training's binary_logloss: 0.283987	valid_1's auc: 0.949477	valid_1's binary_logloss: 0.289707
[202]	training's auc: 0.947597	training's binary_logloss: 0.283671	valid_1's auc: 0.949284	valid_1's binary_logloss: 0.28932
[203]	training's auc: 0.947776	training's binary_logloss: 0.283217	valid_1's auc: 0.949592	valid_1's binary_logloss: 0.288959
[204]	training's auc: 0.94771	training's binary_logloss: 0.282895	valid_1's auc: 0.949361	valid_1's binary_logloss: 0.288577
[205]	training's auc: 0.94792	training's binary_logloss: 0.28246	valid_1's auc: 0.949554	valid_1's binary_logloss: 0.288226
[206]	training's auc: 0.948047	training's binary_logloss: 0.282063	valid_1's auc: 0.949438	valid_1's binary_logloss: 0.2878
[207]	training's auc: 0.948165	training's binary_logloss: 0.281577	valid_1's auc: 0.949631	valid_1's binary_logloss: 0.287316
[208]	training's auc: 0.948196	training's binary_logloss: 0.281281	valid_1's auc: 0.949746	valid_1's binary_logloss: 0.286903
[209]	training's auc: 0.948261	training's binary_logloss: 0.280865	valid_1's auc: 0.949592	valid_1's binary_logloss: 0.286432
[210]	training's auc: 0.948441	training's binary_logloss: 0.280397	valid_1's auc: 0.949785	valid_1's binary_logloss: 0.285968
[211]	training's auc: 0.948542	training's binary_logloss: 0.279929	valid_1's auc: 0.950131	valid_1's binary_logloss: 0.285539
[212]	training's auc: 0.948706	training's binary_logloss: 0.279499	valid_1's auc: 0.950131	valid_1's binary_logloss: 0.28517
[213]	training's auc: 0.948709	training's binary_logloss: 0.279041	valid_1's auc: 0.950131	valid_1's binary_logloss: 0.284716
[214]	training's auc: 0.948934	training's binary_logloss: 0.27859	valid_1's auc: 0.950439	valid_1's binary_logloss: 0.284302
[215]	training's auc: 0.949045	training's binary_logloss: 0.278156	valid_1's auc: 0.950554	valid_1's binary_logloss: 0.283813
[216]	training's auc: 0.949189	training's binary_logloss: 0.277695	valid_1's auc: 0.950554	valid_1's binary_logloss: 0.2834
[217]	training's auc: 0.949192	training's binary_logloss: 0.277397	valid_1's auc: 0.951054	valid_1's binary_logloss: 0.283048
[218]	training's auc: 0.949262	training's binary_logloss: 0.276949	valid_1's auc: 0.951016	valid_1's binary_logloss: 0.282647
[219]	training's auc: 0.949412	training's binary_logloss: 0.27652	valid_1's auc: 0.95117	valid_1's binary_logloss: 0.282212
[220]	training's auc: 0.949442	training's binary_logloss: 0.276073	valid_1's auc: 0.951285	valid_1's binary_logloss: 0.281812
[221]	training's auc: 0.949417	training's binary_logloss: 0.275783	valid_1's auc: 0.951247	valid_1's binary_logloss: 0.281471
[222]	training's auc: 0.949487	training's binary_logloss: 0.275391	valid_1's auc: 0.951401	valid_1's binary_logloss: 0.281126
[223]	training's auc: 0.949738	training's binary_logloss: 0.274974	valid_1's auc: 0.951285	valid_1's binary_logloss: 0.280702
[224]	training's auc: 0.949846	training's binary_logloss: 0.274544	valid_1's auc: 0.951439	valid_1's binary_logloss: 0.280319
[225]	training's auc: 0.949904	training's binary_logloss: 0.274134	valid_1's auc: 0.951632	valid_1's binary_logloss: 0.279902
[226]	training's auc: 0.950501	training's binary_logloss: 0.27371	valid_1's auc: 0.951978	valid_1's binary_logloss: 0.279523
[227]	training's auc: 0.950554	training's binary_logloss: 0.273431	valid_1's auc: 0.952401	valid_1's binary_logloss: 0.279194
[228]	training's auc: 0.950736	training's binary_logloss: 0.273013	valid_1's auc: 0.952363	valid_1's binary_logloss: 0.27878
[229]	training's auc: 0.950938	training's binary_logloss: 0.272576	valid_1's auc: 0.952363	valid_1's binary_logloss: 0.278302
[230]	training's auc: 0.951009	training's binary_logloss: 0.272176	valid_1's auc: 0.952555	valid_1's binary_logloss: 0.27794
[231]	training's auc: 0.951181	training's binary_logloss: 0.271781	valid_1's auc: 0.95267	valid_1's binary_logloss: 0.277518
[232]	training's auc: 0.951277	training's binary_logloss: 0.271472	valid_1's auc: 0.952632	valid_1's binary_logloss: 0.277283
[233]	training's auc: 0.95129	training's binary_logloss: 0.271222	valid_1's auc: 0.952594	valid_1's binary_logloss: 0.27697
[234]	training's auc: 0.951391	training's binary_logloss: 0.27082	valid_1's auc: 0.95294	valid_1's binary_logloss: 0.276567
[235]	training's auc: 0.951472	training's binary_logloss: 0.27057	valid_1's auc: 0.952978	valid_1's binary_logloss: 0.276397
[236]	training's auc: 0.951555	training's binary_logloss: 0.270116	valid_1's auc: 0.953209	valid_1's binary_logloss: 0.275928
[237]	training's auc: 0.951661	training's binary_logloss: 0.269789	valid_1's auc: 0.95294	valid_1's binary_logloss: 0.275697
[238]	training's auc: 0.951709	training's binary_logloss: 0.269467	valid_1's auc: 0.952786	valid_1's binary_logloss: 0.275471
[239]	training's auc: 0.951709	training's binary_logloss: 0.269218	valid_1's auc: 0.953094	valid_1's binary_logloss: 0.275156
[240]	training's auc: 0.951694	training's binary_logloss: 0.268901	valid_1's auc: 0.953171	valid_1's binary_logloss: 0.274934
[241]	training's auc: 0.95177	training's binary_logloss: 0.268646	valid_1's auc: 0.953132	valid_1's binary_logloss: 0.274628
[242]	training's auc: 0.951827	training's binary_logloss: 0.26833	valid_1's auc: 0.953074	valid_1's binary_logloss: 0.274362
[243]	training's auc: 0.951875	training's binary_logloss: 0.268079	valid_1's auc: 0.953228	valid_1's binary_logloss: 0.274062
[244]	training's auc: 0.951981	training's binary_logloss: 0.267739	valid_1's auc: 0.953228	valid_1's binary_logloss: 0.273793
[245]	training's auc: 0.95216	training's binary_logloss: 0.267488	valid_1's auc: 0.953613	valid_1's binary_logloss: 0.273405
[246]	training's auc: 0.9523	training's binary_logloss: 0.267104	valid_1's auc: 0.953382	valid_1's binary_logloss: 0.27302
[247]	training's auc: 0.95235	training's binary_logloss: 0.266802	valid_1's auc: 0.953459	valid_1's binary_logloss: 0.272811
[248]	training's auc: 0.952484	training's binary_logloss: 0.2665	valid_1's auc: 0.953421	valid_1's binary_logloss: 0.272557
[249]	training's auc: 0.952461	training's binary_logloss: 0.266258	valid_1's auc: 0.953652	valid_1's binary_logloss: 0.272267
[250]	training's auc: 0.952658	training's binary_logloss: 0.265882	valid_1's auc: 0.953767	valid_1's binary_logloss: 0.271895
[251]	training's auc: 0.952742	training's binary_logloss: 0.265626	valid_1's auc: 0.953844	valid_1's binary_logloss: 0.27161
[252]	training's auc: 0.952871	training's binary_logloss: 0.265263	valid_1's auc: 0.953844	valid_1's binary_logloss: 0.271277
[253]	training's auc: 0.95301	training's binary_logloss: 0.264901	valid_1's auc: 0.95396	valid_1's binary_logloss: 0.270945
[254]	training's auc: 0.953177	training's binary_logloss: 0.264471	valid_1's auc: 0.954036	valid_1's binary_logloss: 0.270524
[255]	training's auc: 0.953374	training's binary_logloss: 0.264119	valid_1's auc: 0.953902	valid_1's binary_logloss: 0.270196
[256]	training's auc: 0.953399	training's binary_logloss: 0.263837	valid_1's auc: 0.953979	valid_1's binary_logloss: 0.26996
[257]	training's auc: 0.953447	training's binary_logloss: 0.263481	valid_1's auc: 0.954171	valid_1's binary_logloss: 0.269692
[258]	training's auc: 0.953424	training's binary_logloss: 0.26324	valid_1's auc: 0.954287	valid_1's binary_logloss: 0.269441
[259]	training's auc: 0.95352	training's binary_logloss: 0.262878	valid_1's auc: 0.954325	valid_1's binary_logloss: 0.269078
[260]	training's auc: 0.953503	training's binary_logloss: 0.262642	valid_1's auc: 0.95421	valid_1's binary_logloss: 0.268831
[261]	training's auc: 0.953548	training's binary_logloss: 0.262413	valid_1's auc: 0.954171	valid_1's binary_logloss: 0.268593
[262]	training's auc: 0.953528	training's binary_logloss: 0.262186	valid_1's auc: 0.954287	valid_1's binary_logloss: 0.268372
[263]	training's auc: 0.95367	training's binary_logloss: 0.261816	valid_1's auc: 0.954364	valid_1's binary_logloss: 0.26797
[264]	training's auc: 0.953672	training's binary_logloss: 0.261546	valid_1's auc: 0.954325	valid_1's binary_logloss: 0.267787
[265]	training's auc: 0.953948	training's binary_logloss: 0.261322	valid_1's auc: 0.954594	valid_1's binary_logloss: 0.267492
[266]	training's auc: 0.953932	training's binary_logloss: 0.261056	valid_1's auc: 0.954633	valid_1's binary_logloss: 0.267314
[267]	training's auc: 0.954031	training's binary_logloss: 0.260839	valid_1's auc: 0.954633	valid_1's binary_logloss: 0.267091
[268]	training's auc: 0.954114	training's binary_logloss: 0.260507	valid_1's auc: 0.954748	valid_1's binary_logloss: 0.266767
[269]	training's auc: 0.954236	training's binary_logloss: 0.260303	valid_1's auc: 0.954748	valid_1's binary_logloss: 0.266497
[270]	training's auc: 0.954291	training's binary_logloss: 0.259971	valid_1's auc: 0.954787	valid_1's binary_logloss: 0.266198
[271]	training's auc: 0.954405	training's binary_logloss: 0.259669	valid_1's auc: 0.954787	valid_1's binary_logloss: 0.265979
[272]	training's auc: 0.954319	training's binary_logloss: 0.25945	valid_1's auc: 0.954825	valid_1's binary_logloss: 0.265749
[273]	training's auc: 0.954342	training's binary_logloss: 0.259234	valid_1's auc: 0.954864	valid_1's binary_logloss: 0.265523
[274]	training's auc: 0.954322	training's binary_logloss: 0.259029	valid_1's auc: 0.954864	valid_1's binary_logloss: 0.265326
[275]	training's auc: 0.954524	training's binary_logloss: 0.258706	valid_1's auc: 0.954979	valid_1's binary_logloss: 0.265008
[276]	training's auc: 0.954564	training's binary_logloss: 0.258461	valid_1's auc: 0.955018	valid_1's binary_logloss: 0.264838
[277]	training's auc: 0.954681	training's binary_logloss: 0.25826	valid_1's auc: 0.955037	valid_1's binary_logloss: 0.264586
[278]	training's auc: 0.954648	training's binary_logloss: 0.257937	valid_1's auc: 0.954922	valid_1's binary_logloss: 0.264295
[279]	training's auc: 0.954678	training's binary_logloss: 0.257728	valid_1's auc: 0.95496	valid_1's binary_logloss: 0.264047
[280]	training's auc: 0.954835	training's binary_logloss: 0.257479	valid_1's auc: 0.95496	valid_1's binary_logloss: 0.263881
[281]	training's auc: 0.954926	training's binary_logloss: 0.257276	valid_1's auc: 0.955075	valid_1's binary_logloss: 0.263675
[282]	training's auc: 0.955275	training's binary_logloss: 0.256958	valid_1's auc: 0.955268	valid_1's binary_logloss: 0.263362
[283]	training's auc: 0.955368	training's binary_logloss: 0.256762	valid_1's auc: 0.95546	valid_1's binary_logloss: 0.263172
[284]	training's auc: 0.955318	training's binary_logloss: 0.256519	valid_1's auc: 0.955422	valid_1's binary_logloss: 0.263011
[285]	training's auc: 0.955416	training's binary_logloss: 0.256317	valid_1's auc: 0.955229	valid_1's binary_logloss: 0.262772
[286]	training's auc: 0.955368	training's binary_logloss: 0.256083	valid_1's auc: 0.955537	valid_1's binary_logloss: 0.262607
[287]	training's auc: 0.955663	training's binary_logloss: 0.255611	valid_1's auc: 0.956057	valid_1's binary_logloss: 0.262116
[288]	training's auc: 0.9556	training's binary_logloss: 0.255426	valid_1's auc: 0.956249	valid_1's binary_logloss: 0.261883
[289]	training's auc: 0.955685	training's binary_logloss: 0.255189	valid_1's auc: 0.956211	valid_1's binary_logloss: 0.261669
[290]	training's auc: 0.955961	training's binary_logloss: 0.254718	valid_1's auc: 0.956557	valid_1's binary_logloss: 0.261171
[291]	training's auc: 0.955956	training's binary_logloss: 0.254506	valid_1's auc: 0.956788	valid_1's binary_logloss: 0.261026
[292]	training's auc: 0.956085	training's binary_logloss: 0.254214	valid_1's auc: 0.956865	valid_1's binary_logloss: 0.260691
[293]	training's auc: 0.956201	training's binary_logloss: 0.253985	valid_1's auc: 0.956865	valid_1's binary_logloss: 0.260485
[294]	training's auc: 0.956229	training's binary_logloss: 0.253766	valid_1's auc: 0.957057	valid_1's binary_logloss: 0.260332
[295]	training's auc: 0.956324	training's binary_logloss: 0.253387	valid_1's auc: 0.957134	valid_1's binary_logloss: 0.260008
[296]	training's auc: 0.95642	training's binary_logloss: 0.253168	valid_1's auc: 0.957326	valid_1's binary_logloss: 0.259755
[297]	training's auc: 0.956465	training's binary_logloss: 0.252946	valid_1's auc: 0.957442	valid_1's binary_logloss: 0.259642
[298]	training's auc: 0.95689	training's binary_logloss: 0.252651	valid_1's auc: 0.957365	valid_1's binary_logloss: 0.259371
[299]	training's auc: 0.957029	training's binary_logloss: 0.252264	valid_1's auc: 0.957365	valid_1's binary_logloss: 0.259044
[300]	training's auc: 0.957092	training's binary_logloss: 0.252042	valid_1's auc: 0.957403	valid_1's binary_logloss: 0.258791
[301]	training's auc: 0.957176	training's binary_logloss: 0.251765	valid_1's auc: 0.957557	valid_1's binary_logloss: 0.258525
[302]	training's auc: 0.957234	training's binary_logloss: 0.251556	valid_1's auc: 0.957634	valid_1's binary_logloss: 0.258374
[303]	training's auc: 0.957234	training's binary_logloss: 0.25129	valid_1's auc: 0.95775	valid_1's binary_logloss: 0.258133
[304]	training's auc: 0.957241	training's binary_logloss: 0.251072	valid_1's auc: 0.957788	valid_1's binary_logloss: 0.257886
[305]	training's auc: 0.95733	training's binary_logloss: 0.250821	valid_1's auc: 0.95775	valid_1's binary_logloss: 0.25769
[306]	training's auc: 0.957547	training's binary_logloss: 0.250494	valid_1's auc: 0.957634	valid_1's binary_logloss: 0.257433
[307]	training's auc: 0.957547	training's binary_logloss: 0.250238	valid_1's auc: 0.958096	valid_1's binary_logloss: 0.257202
[308]	training's auc: 0.95754	training's binary_logloss: 0.250063	valid_1's auc: 0.958173	valid_1's binary_logloss: 0.257032
[309]	training's auc: 0.957722	training's binary_logloss: 0.249626	valid_1's auc: 0.958327	valid_1's binary_logloss: 0.25658
[310]	training's auc: 0.957823	training's binary_logloss: 0.249451	valid_1's auc: 0.958211	valid_1's binary_logloss: 0.256458
[311]	training's auc: 0.958098	training's binary_logloss: 0.249027	valid_1's auc: 0.958173	valid_1's binary_logloss: 0.255988
[312]	training's auc: 0.958172	training's binary_logloss: 0.248756	valid_1's auc: 0.958211	valid_1's binary_logloss: 0.25574
[313]	training's auc: 0.958131	training's binary_logloss: 0.248477	valid_1's auc: 0.958288	valid_1's binary_logloss: 0.255488
[314]	training's auc: 0.958194	training's binary_logloss: 0.248211	valid_1's auc: 0.958327	valid_1's binary_logloss: 0.255245
[315]	training's auc: 0.958404	training's binary_logloss: 0.24779	valid_1's auc: 0.958596	valid_1's binary_logloss: 0.254811
[316]	training's auc: 0.958437	training's binary_logloss: 0.24759	valid_1's auc: 0.958596	valid_1's binary_logloss: 0.254661
[317]	training's auc: 0.958594	training's binary_logloss: 0.247358	valid_1's auc: 0.958519	valid_1's binary_logloss: 0.254396
[318]	training's auc: 0.958691	training's binary_logloss: 0.247022	valid_1's auc: 0.958904	valid_1's binary_logloss: 0.254123
[319]	training's auc: 0.958807	training's binary_logloss: 0.246834	valid_1's auc: 0.95902	valid_1's binary_logloss: 0.253995
[320]	training's auc: 0.95887	training's binary_logloss: 0.246597	valid_1's auc: 0.959173	valid_1's binary_logloss: 0.253781
[321]	training's auc: 0.958916	training's binary_logloss: 0.246353	valid_1's auc: 0.959327	valid_1's binary_logloss: 0.253548
[322]	training's auc: 0.958979	training's binary_logloss: 0.24614	valid_1's auc: 0.959289	valid_1's binary_logloss: 0.25331
[323]	training's auc: 0.959159	training's binary_logloss: 0.245878	valid_1's auc: 0.959212	valid_1's binary_logloss: 0.253011
[324]	training's auc: 0.959212	training's binary_logloss: 0.245532	valid_1's auc: 0.959289	valid_1's binary_logloss: 0.25272
[325]	training's auc: 0.959245	training's binary_logloss: 0.245357	valid_1's auc: 0.959481	valid_1's binary_logloss: 0.252564
[326]	training's auc: 0.959346	training's binary_logloss: 0.245032	valid_1's auc: 0.959327	valid_1's binary_logloss: 0.252279
[327]	training's auc: 0.959513	training's binary_logloss: 0.24468	valid_1's auc: 0.959751	valid_1's binary_logloss: 0.251982
[328]	training's auc: 0.95949	training's binary_logloss: 0.244508	valid_1's auc: 0.959828	valid_1's binary_logloss: 0.251831
[329]	training's auc: 0.959662	training's binary_logloss: 0.244106	valid_1's auc: 0.959789	valid_1's binary_logloss: 0.251419
[330]	training's auc: 0.959882	training's binary_logloss: 0.24367	valid_1's auc: 0.959982	valid_1's binary_logloss: 0.251003
[331]	training's auc: 0.960043	training's binary_logloss: 0.24334	valid_1's auc: 0.959866	valid_1's binary_logloss: 0.250725
[332]	training's auc: 0.960081	training's binary_logloss: 0.243053	valid_1's auc: 0.96002	valid_1's binary_logloss: 0.250443
[333]	training's auc: 0.960225	training's binary_logloss: 0.242697	valid_1's auc: 0.960405	valid_1's binary_logloss: 0.250156
[334]	training's auc: 0.960266	training's binary_logloss: 0.242414	valid_1's auc: 0.960443	valid_1's binary_logloss: 0.249928
[335]	training's auc: 0.960344	training's binary_logloss: 0.242186	valid_1's auc: 0.96052	valid_1's binary_logloss: 0.249711
[336]	training's auc: 0.960511	training's binary_logloss: 0.241906	valid_1's auc: 0.960482	valid_1's binary_logloss: 0.249436
[337]	training's auc: 0.960514	training's binary_logloss: 0.241749	valid_1's auc: 0.96052	valid_1's binary_logloss: 0.249294
[338]	training's auc: 0.960577	training's binary_logloss: 0.24155	valid_1's auc: 0.960597	valid_1's binary_logloss: 0.249167
[339]	training's auc: 0.960736	training's binary_logloss: 0.24128	valid_1's auc: 0.96079	valid_1's binary_logloss: 0.248943
[340]	training's auc: 0.96087	training's binary_logloss: 0.240996	valid_1's auc: 0.960828	valid_1's binary_logloss: 0.248651
[341]	training's auc: 0.960807	training's binary_logloss: 0.240801	valid_1's auc: 0.960982	valid_1's binary_logloss: 0.248434
[342]	training's auc: 0.961049	training's binary_logloss: 0.240453	valid_1's auc: 0.961213	valid_1's binary_logloss: 0.248155
[343]	training's auc: 0.961052	training's binary_logloss: 0.240293	valid_1's auc: 0.96129	valid_1's binary_logloss: 0.24796
[344]	training's auc: 0.961183	training's binary_logloss: 0.240033	valid_1's auc: 0.961367	valid_1's binary_logloss: 0.247736
[345]	training's auc: 0.96132	training's binary_logloss: 0.239633	valid_1's auc: 0.961405	valid_1's binary_logloss: 0.247401
[346]	training's auc: 0.961391	training's binary_logloss: 0.239462	valid_1's auc: 0.961521	valid_1's binary_logloss: 0.247224
[347]	training's auc: 0.961512	training's binary_logloss: 0.239171	valid_1's auc: 0.961521	valid_1's binary_logloss: 0.246979
[348]	training's auc: 0.961525	training's binary_logloss: 0.239021	valid_1's auc: 0.961713	valid_1's binary_logloss: 0.246852
[349]	training's auc: 0.961636	training's binary_logloss: 0.238714	valid_1's auc: 0.961829	valid_1's binary_logloss: 0.246596
[350]	training's auc: 0.96174	training's binary_logloss: 0.238435	valid_1's auc: 0.962059	valid_1's binary_logloss: 0.246327
[351]	training's auc: 0.961825	training's binary_logloss: 0.238101	valid_1's auc: 0.962329	valid_1's binary_logloss: 0.24606
[352]	training's auc: 0.961831	training's binary_logloss: 0.237955	valid_1's auc: 0.962367	valid_1's binary_logloss: 0.245908
[353]	training's auc: 0.961876	training's binary_logloss: 0.237681	valid_1's auc: 0.962521	valid_1's binary_logloss: 0.245637
[354]	training's auc: 0.961927	training's binary_logloss: 0.237442	valid_1's auc: 0.96256	valid_1's binary_logloss: 0.245446
[355]	training's auc: 0.962005	training's binary_logloss: 0.237264	valid_1's auc: 0.962521	valid_1's binary_logloss: 0.245336
[356]	training's auc: 0.962184	training's binary_logloss: 0.236936	valid_1's auc: 0.962483	valid_1's binary_logloss: 0.245075
[357]	training's auc: 0.962291	training's binary_logloss: 0.236793	valid_1's auc: 0.962521	valid_1's binary_logloss: 0.244926
[358]	training's auc: 0.962452	training's binary_logloss: 0.236618	valid_1's auc: 0.962521	valid_1's binary_logloss: 0.244817
[359]	training's auc: 0.962432	training's binary_logloss: 0.236365	valid_1's auc: 0.962752	valid_1's binary_logloss: 0.244569
[360]	training's auc: 0.962495	training's binary_logloss: 0.23612	valid_1's auc: 0.962983	valid_1's binary_logloss: 0.244294
[361]	training's auc: 0.962569	training's binary_logloss: 0.23585	valid_1's auc: 0.962983	valid_1's binary_logloss: 0.244083
[362]	training's auc: 0.962698	training's binary_logloss: 0.235518	valid_1's auc: 0.963252	valid_1's binary_logloss: 0.243873
[363]	training's auc: 0.962776	training's binary_logloss: 0.235288	valid_1's auc: 0.963291	valid_1's binary_logloss: 0.243691
[364]	training's auc: 0.962766	training's binary_logloss: 0.235021	valid_1's auc: 0.963329	valid_1's binary_logloss: 0.243455
[365]	training's auc: 0.962783	training's binary_logloss: 0.234851	valid_1's auc: 0.963483	valid_1's binary_logloss: 0.243277
[366]	training's auc: 0.962837	training's binary_logloss: 0.234614	valid_1's auc: 0.963522	valid_1's binary_logloss: 0.243012
[367]	training's auc: 0.962907	training's binary_logloss: 0.23439	valid_1's auc: 0.963753	valid_1's binary_logloss: 0.242769
[368]	training's auc: 0.963087	training's binary_logloss: 0.234072	valid_1's auc: 0.963945	valid_1's binary_logloss: 0.242516
[369]	training's auc: 0.963142	training's binary_logloss: 0.233903	valid_1's auc: 0.96406	valid_1's binary_logloss: 0.242412
[370]	training's auc: 0.963233	training's binary_logloss: 0.23359	valid_1's auc: 0.964137	valid_1's binary_logloss: 0.242164
[371]	training's auc: 0.963251	training's binary_logloss: 0.233454	valid_1's auc: 0.964137	valid_1's binary_logloss: 0.242022
[372]	training's auc: 0.963378	training's binary_logloss: 0.233215	valid_1's auc: 0.96433	valid_1's binary_logloss: 0.241792
[373]	training's auc: 0.96342	training's binary_logloss: 0.23305	valid_1's auc: 0.964253	valid_1's binary_logloss: 0.241691
[374]	training's auc: 0.96356	training's binary_logloss: 0.232743	valid_1's auc: 0.96433	valid_1's binary_logloss: 0.241447
[375]	training's auc: 0.963643	training's binary_logloss: 0.232485	valid_1's auc: 0.964484	valid_1's binary_logloss: 0.241247
[376]	training's auc: 0.963724	training's binary_logloss: 0.232323	valid_1's auc: 0.964522	valid_1's binary_logloss: 0.241148
[377]	training's auc: 0.96383	training's binary_logloss: 0.232043	valid_1's auc: 0.964561	valid_1's binary_logloss: 0.240885
[378]	training's auc: 0.963926	training's binary_logloss: 0.231674	valid_1's auc: 0.964714	valid_1's binary_logloss: 0.240529
[379]	training's auc: 0.963977	training's binary_logloss: 0.231456	valid_1's auc: 0.964868	valid_1's binary_logloss: 0.240291
[380]	training's auc: 0.964022	training's binary_logloss: 0.231252	valid_1's auc: 0.964984	valid_1's binary_logloss: 0.24011
[381]	training's auc: 0.964025	training's binary_logloss: 0.231051	valid_1's auc: 0.964945	valid_1's binary_logloss: 0.239965
[382]	training's auc: 0.96429	training's binary_logloss: 0.230804	valid_1's auc: 0.965099	valid_1's binary_logloss: 0.239759
[383]	training's auc: 0.964447	training's binary_logloss: 0.230621	valid_1's auc: 0.965061	valid_1's binary_logloss: 0.239551
[384]	training's auc: 0.964472	training's binary_logloss: 0.230463	valid_1's auc: 0.965022	valid_1's binary_logloss: 0.239455
[385]	training's auc: 0.96451	training's binary_logloss: 0.230167	valid_1's auc: 0.965061	valid_1's binary_logloss: 0.239213
[386]	training's auc: 0.964573	training's binary_logloss: 0.229987	valid_1's auc: 0.965099	valid_1's binary_logloss: 0.239009
[387]	training's auc: 0.964859	training's binary_logloss: 0.229617	valid_1's auc: 0.965369	valid_1's binary_logloss: 0.238657
[388]	training's auc: 0.964879	training's binary_logloss: 0.229462	valid_1's auc: 0.965292	valid_1's binary_logloss: 0.238563
[389]	training's auc: 0.964945	training's binary_logloss: 0.229179	valid_1's auc: 0.965407	valid_1's binary_logloss: 0.238343
[390]	training's auc: 0.965023	training's binary_logloss: 0.228972	valid_1's auc: 0.965446	valid_1's binary_logloss: 0.238182
[391]	training's auc: 0.965147	training's binary_logloss: 0.228734	valid_1's auc: 0.965446	valid_1's binary_logloss: 0.237963
[392]	training's auc: 0.965177	training's binary_logloss: 0.228581	valid_1's auc: 0.9656	valid_1's binary_logloss: 0.237872
[393]	training's auc: 0.9652	training's binary_logloss: 0.228355	valid_1's auc: 0.965369	valid_1's binary_logloss: 0.237722
[394]	training's auc: 0.965261	training's binary_logloss: 0.228181	valid_1's auc: 0.965369	valid_1's binary_logloss: 0.237509
[395]	training's auc: 0.965304	training's binary_logloss: 0.227979	valid_1's auc: 0.965369	valid_1's binary_logloss: 0.237353
[396]	training's auc: 0.965473	training's binary_logloss: 0.227608	valid_1's auc: 0.965446	valid_1's binary_logloss: 0.236994
[397]	training's auc: 0.965551	training's binary_logloss: 0.22738	valid_1's auc: 0.965484	valid_1's binary_logloss: 0.236843
[398]	training's auc: 0.965592	training's binary_logloss: 0.227165	valid_1's auc: 0.965523	valid_1's binary_logloss: 0.236703
[399]	training's auc: 0.965738	training's binary_logloss: 0.226801	valid_1's auc: 0.965753	valid_1's binary_logloss: 0.236351
[400]	training's auc: 0.965893	training's binary_logloss: 0.226647	valid_1's auc: 0.965869	valid_1's binary_logloss: 0.236193
[401]	training's auc: 0.96592	training's binary_logloss: 0.226457	valid_1's auc: 0.965792	valid_1's binary_logloss: 0.236025
[402]	training's auc: 0.965974	training's binary_logloss: 0.22619	valid_1's auc: 0.965753	valid_1's binary_logloss: 0.235807
[403]	training's auc: 0.966054	training's binary_logloss: 0.22596	valid_1's auc: 0.965753	valid_1's binary_logloss: 0.235622
[404]	training's auc: 0.966067	training's binary_logloss: 0.225673	valid_1's auc: 0.965792	valid_1's binary_logloss: 0.23537
[405]	training's auc: 0.966095	training's binary_logloss: 0.225533	valid_1's auc: 0.965753	valid_1's binary_logloss: 0.235291
[406]	training's auc: 0.966095	training's binary_logloss: 0.225338	valid_1's auc: 0.96583	valid_1's binary_logloss: 0.235142
[407]	training's auc: 0.966234	training's binary_logloss: 0.225109	valid_1's auc: 0.965869	valid_1's binary_logloss: 0.234958
[408]	training's auc: 0.966315	training's binary_logloss: 0.224874	valid_1's auc: 0.96583	valid_1's binary_logloss: 0.234748
[409]	training's auc: 0.966434	training's binary_logloss: 0.224591	valid_1's auc: 0.965869	valid_1's binary_logloss: 0.234508
[410]	training's auc: 0.966689	training's binary_logloss: 0.224386	valid_1's auc: 0.96583	valid_1's binary_logloss: 0.234272
[411]	training's auc: 0.966886	training's binary_logloss: 0.224155	valid_1's auc: 0.965946	valid_1's binary_logloss: 0.23407
[412]	training's auc: 0.966957	training's binary_logloss: 0.223895	valid_1's auc: 0.966177	valid_1's binary_logloss: 0.233834
[413]	training's auc: 0.966957	training's binary_logloss: 0.223746	valid_1's auc: 0.966138	valid_1's binary_logloss: 0.233681
[414]	training's auc: 0.967081	training's binary_logloss: 0.223537	valid_1's auc: 0.966177	valid_1's binary_logloss: 0.233443
[415]	training's auc: 0.967237	training's binary_logloss: 0.223174	valid_1's auc: 0.966138	valid_1's binary_logloss: 0.233083
[416]	training's auc: 0.967109	training's binary_logloss: 0.223028	valid_1's auc: 0.966292	valid_1's binary_logloss: 0.232933
[417]	training's auc: 0.967328	training's binary_logloss: 0.222753	valid_1's auc: 0.966215	valid_1's binary_logloss: 0.2327
[418]	training's auc: 0.967384	training's binary_logloss: 0.222609	valid_1's auc: 0.966254	valid_1's binary_logloss: 0.232616
[419]	training's auc: 0.967419	training's binary_logloss: 0.222404	valid_1's auc: 0.966331	valid_1's binary_logloss: 0.232382
[420]	training's auc: 0.96749	training's binary_logloss: 0.222181	valid_1's auc: 0.9666	valid_1's binary_logloss: 0.232187
[421]	training's auc: 0.967475	training's binary_logloss: 0.222006	valid_1's auc: 0.9666	valid_1's binary_logloss: 0.232027
[422]	training's auc: 0.967493	training's binary_logloss: 0.221861	valid_1's auc: 0.966715	valid_1's binary_logloss: 0.231855
[423]	training's auc: 0.967601	training's binary_logloss: 0.221651	valid_1's auc: 0.966715	valid_1's binary_logloss: 0.231721
[424]	training's auc: 0.967748	training's binary_logloss: 0.221307	valid_1's auc: 0.966869	valid_1's binary_logloss: 0.231391
[425]	training's auc: 0.967816	training's binary_logloss: 0.22115	valid_1's auc: 0.966946	valid_1's binary_logloss: 0.231226
[426]	training's auc: 0.967892	training's binary_logloss: 0.220882	valid_1's auc: 0.966908	valid_1's binary_logloss: 0.230999
[427]	training's auc: 0.967928	training's binary_logloss: 0.220663	valid_1's auc: 0.967062	valid_1's binary_logloss: 0.230809
[428]	training's auc: 0.968039	training's binary_logloss: 0.220413	valid_1's auc: 0.967062	valid_1's binary_logloss: 0.230648
[429]	training's auc: 0.968233	training's binary_logloss: 0.22019	valid_1's auc: 0.967062	valid_1's binary_logloss: 0.230436
[430]	training's auc: 0.968213	training's binary_logloss: 0.220002	valid_1's auc: 0.967177	valid_1's binary_logloss: 0.230221
[431]	training's auc: 0.968319	training's binary_logloss: 0.219824	valid_1's auc: 0.967139	valid_1's binary_logloss: 0.230071
[432]	training's auc: 0.968327	training's binary_logloss: 0.219684	valid_1's auc: 0.967177	valid_1's binary_logloss: 0.229991
[433]	training's auc: 0.9684	training's binary_logloss: 0.219546	valid_1's auc: 0.967177	valid_1's binary_logloss: 0.22985
[434]	training's auc: 0.968413	training's binary_logloss: 0.21937	valid_1's auc: 0.967216	valid_1's binary_logloss: 0.229678
[435]	training's auc: 0.968511	training's binary_logloss: 0.219109	valid_1's auc: 0.967216	valid_1's binary_logloss: 0.229457
[436]	training's auc: 0.968613	training's binary_logloss: 0.218869	valid_1's auc: 0.967254	valid_1's binary_logloss: 0.229211
[437]	training's auc: 0.968673	training's binary_logloss: 0.218612	valid_1's auc: 0.967293	valid_1's binary_logloss: 0.228994
[438]	training's auc: 0.968683	training's binary_logloss: 0.218475	valid_1's auc: 0.96737	valid_1's binary_logloss: 0.228916
[439]	training's auc: 0.968736	training's binary_logloss: 0.21834	valid_1's auc: 0.967485	valid_1's binary_logloss: 0.228774
[440]	training's auc: 0.968782	training's binary_logloss: 0.218102	valid_1's auc: 0.967639	valid_1's binary_logloss: 0.228558
[441]	training's auc: 0.96882	training's binary_logloss: 0.217967	valid_1's auc: 0.967677	valid_1's binary_logloss: 0.228482
[442]	training's auc: 0.968827	training's binary_logloss: 0.217813	valid_1's auc: 0.967754	valid_1's binary_logloss: 0.228299
[443]	training's auc: 0.968906	training's binary_logloss: 0.217522	valid_1's auc: 0.967831	valid_1's binary_logloss: 0.228142
[444]	training's auc: 0.968944	training's binary_logloss: 0.217334	valid_1's auc: 0.967908	valid_1's binary_logloss: 0.227971
[445]	training's auc: 0.969037	training's binary_logloss: 0.217164	valid_1's auc: 0.967908	valid_1's binary_logloss: 0.227847
[446]	training's auc: 0.969118	training's binary_logloss: 0.21696	valid_1's auc: 0.967908	valid_1's binary_logloss: 0.227666
[447]	training's auc: 0.969143	training's binary_logloss: 0.216755	valid_1's auc: 0.967908	valid_1's binary_logloss: 0.227477
[448]	training's auc: 0.969227	training's binary_logloss: 0.21657	valid_1's auc: 0.968024	valid_1's binary_logloss: 0.227311
[449]	training's auc: 0.969275	training's binary_logloss: 0.216369	valid_1's auc: 0.967947	valid_1's binary_logloss: 0.227133
[450]	training's auc: 0.969363	training's binary_logloss: 0.216186	valid_1's auc: 0.968024	valid_1's binary_logloss: 0.226925
[451]	training's auc: 0.969401	training's binary_logloss: 0.215993	valid_1's auc: 0.968101	valid_1's binary_logloss: 0.226757
[452]	training's auc: 0.969442	training's binary_logloss: 0.215799	valid_1's auc: 0.968101	valid_1's binary_logloss: 0.226635
[453]	training's auc: 0.969477	training's binary_logloss: 0.215609	valid_1's auc: 0.968216	valid_1's binary_logloss: 0.22647
[454]	training's auc: 0.969588	training's binary_logloss: 0.21537	valid_1's auc: 0.968332	valid_1's binary_logloss: 0.226279
[455]	training's auc: 0.969626	training's binary_logloss: 0.215241	valid_1's auc: 0.968332	valid_1's binary_logloss: 0.226206
[456]	training's auc: 0.969672	training's binary_logloss: 0.215075	valid_1's auc: 0.968408	valid_1's binary_logloss: 0.226044
[457]	training's auc: 0.96977	training's binary_logloss: 0.214884	valid_1's auc: 0.968562	valid_1's binary_logloss: 0.225917
[458]	training's auc: 0.969801	training's binary_logloss: 0.21457	valid_1's auc: 0.968601	valid_1's binary_logloss: 0.225685
[459]	training's auc: 0.969864	training's binary_logloss: 0.214429	valid_1's auc: 0.968601	valid_1's binary_logloss: 0.225557
[460]	training's auc: 0.969894	training's binary_logloss: 0.214253	valid_1's auc: 0.968639	valid_1's binary_logloss: 0.225403
[461]	training's auc: 0.969904	training's binary_logloss: 0.214074	valid_1's auc: 0.968639	valid_1's binary_logloss: 0.225236
[462]	training's auc: 0.970046	training's binary_logloss: 0.213775	valid_1's auc: 0.968755	valid_1's binary_logloss: 0.225027
[463]	training's auc: 0.970114	training's binary_logloss: 0.213477	valid_1's auc: 0.968678	valid_1's binary_logloss: 0.224797
[464]	training's auc: 0.970165	training's binary_logloss: 0.213356	valid_1's auc: 0.968947	valid_1's binary_logloss: 0.224691
[465]	training's auc: 0.970263	training's binary_logloss: 0.213182	valid_1's auc: 0.969024	valid_1's binary_logloss: 0.224496
[466]	training's auc: 0.970238	training's binary_logloss: 0.213038	valid_1's auc: 0.969255	valid_1's binary_logloss: 0.224324
[467]	training's auc: 0.970339	training's binary_logloss: 0.212911	valid_1's auc: 0.969294	valid_1's binary_logloss: 0.224253
[468]	training's auc: 0.970463	training's binary_logloss: 0.212686	valid_1's auc: 0.969255	valid_1's binary_logloss: 0.224066
[469]	training's auc: 0.97045	training's binary_logloss: 0.212584	valid_1's auc: 0.96937	valid_1's binary_logloss: 0.223934
[470]	training's auc: 0.970518	training's binary_logloss: 0.21246	valid_1's auc: 0.969486	valid_1's binary_logloss: 0.223865
[471]	training's auc: 0.970564	training's binary_logloss: 0.212336	valid_1's auc: 0.969332	valid_1's binary_logloss: 0.223783
[472]	training's auc: 0.970604	training's binary_logloss: 0.212163	valid_1's auc: 0.969447	valid_1's binary_logloss: 0.223596
[473]	training's auc: 0.970736	training's binary_logloss: 0.211961	valid_1's auc: 0.969563	valid_1's binary_logloss: 0.223457
[474]	training's auc: 0.970819	training's binary_logloss: 0.211785	valid_1's auc: 0.969563	valid_1's binary_logloss: 0.223338
[475]	training's auc: 0.97086	training's binary_logloss: 0.211551	valid_1's auc: 0.96964	valid_1's binary_logloss: 0.223157
[476]	training's auc: 0.970923	training's binary_logloss: 0.211342	valid_1's auc: 0.969717	valid_1's binary_logloss: 0.223003
[477]	training's auc: 0.970979	training's binary_logloss: 0.21118	valid_1's auc: 0.969717	valid_1's binary_logloss: 0.22284
[478]	training's auc: 0.970981	training's binary_logloss: 0.21104	valid_1's auc: 0.969678	valid_1's binary_logloss: 0.222674
[479]	training's auc: 0.971067	training's binary_logloss: 0.210821	valid_1's auc: 0.969717	valid_1's binary_logloss: 0.222493
[480]	training's auc: 0.971107	training's binary_logloss: 0.2107	valid_1's auc: 0.969755	valid_1's binary_logloss: 0.222425
[481]	training's auc: 0.97113	training's binary_logloss: 0.210562	valid_1's auc: 0.969871	valid_1's binary_logloss: 0.222261
[482]	training's auc: 0.971161	training's binary_logloss: 0.210447	valid_1's auc: 0.969909	valid_1's binary_logloss: 0.222144
[483]	training's auc: 0.971241	training's binary_logloss: 0.210297	valid_1's auc: 0.969871	valid_1's binary_logloss: 0.222022
[484]	training's auc: 0.971416	training's binary_logloss: 0.210001	valid_1's auc: 0.970179	valid_1's binary_logloss: 0.221812
[485]	training's auc: 0.971492	training's binary_logloss: 0.209822	valid_1's auc: 0.970217	valid_1's binary_logloss: 0.221696
[486]	training's auc: 0.97156	training's binary_logloss: 0.209593	valid_1's auc: 0.97014	valid_1's binary_logloss: 0.221543
[487]	training's auc: 0.971555	training's binary_logloss: 0.209418	valid_1's auc: 0.970063	valid_1's binary_logloss: 0.221384
[488]	training's auc: 0.971732	training's binary_logloss: 0.209116	valid_1's auc: 0.970256	valid_1's binary_logloss: 0.22118
[489]	training's auc: 0.971737	training's binary_logloss: 0.208944	valid_1's auc: 0.970217	valid_1's binary_logloss: 0.221023
[490]	training's auc: 0.97179	training's binary_logloss: 0.208716	valid_1's auc: 0.970102	valid_1's binary_logloss: 0.220856
[491]	training's auc: 0.971868	training's binary_logloss: 0.208553	valid_1's auc: 0.970179	valid_1's binary_logloss: 0.220674
[492]	training's auc: 0.971873	training's binary_logloss: 0.208363	valid_1's auc: 0.970256	valid_1's binary_logloss: 0.220513
[493]	training's auc: 0.972071	training's binary_logloss: 0.208189	valid_1's auc: 0.970294	valid_1's binary_logloss: 0.220401
[494]	training's auc: 0.972141	training's binary_logloss: 0.208024	valid_1's auc: 0.970371	valid_1's binary_logloss: 0.220222
[495]	training's auc: 0.972197	training's binary_logloss: 0.207864	valid_1's auc: 0.970525	valid_1's binary_logloss: 0.220045
[496]	training's auc: 0.972217	training's binary_logloss: 0.207781	valid_1's auc: 0.970409	valid_1's binary_logloss: 0.219975
[497]	training's auc: 0.972333	training's binary_logloss: 0.207498	valid_1's auc: 0.97064	valid_1's binary_logloss: 0.219781
[498]	training's auc: 0.972359	training's binary_logloss: 0.207381	valid_1's auc: 0.97064	valid_1's binary_logloss: 0.219717
[499]	training's auc: 0.972498	training's binary_logloss: 0.207231	valid_1's auc: 0.970756	valid_1's binary_logloss: 0.219567
[500]	training's auc: 0.972495	training's binary_logloss: 0.207045	valid_1's auc: 0.970756	valid_1's binary_logloss: 0.21941
Did not meet early stopping. Best iteration is:
[499]	training's auc: 0.972498	training's binary_logloss: 0.207231	valid_1's auc: 0.970756	valid_1's binary_logloss: 0.219567
Eval ACC: 0.9291338582677166
prediction_test
array([1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0,
       1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0,
       1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1,
       0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1,
       1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0,
       1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0,
       1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1,
       0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1,
       0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1,
       1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0,
       1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0,
       1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1,
       0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1,
       1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0,
       1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1], dtype=uint8)
from lightgbm import LGBMClassifier
clf = LGBMClassifier(n_estimators=10000,
                     learning_rate=0.5,
                     min_child_samples=10,
                     random_state=1,
                     colsample_bytree=0.8,
                     reg_alpha=2,
                     reg_lambda=2)

clf.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=50,
        eval_metric = 'auc', early_stopping_rounds = 100)

eval_score = accuracy_score(y_test, clf.predict(X_test))

print('Eval ACC: {}'.format(eval_score))

preds = clf.predict(test)
Training until validation scores don't improve for 100 rounds.
[50]	valid_0's auc: 0.980722	valid_0's binary_logloss: 0.133523
[100]	valid_0's auc: 0.980722	valid_0's binary_logloss: 0.133523
Early stopping, best iteration is:
[32]	valid_0's auc: 0.981799	valid_0's binary_logloss: 0.133289
Eval ACC: 0.9658792650918635
preds
array([1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1,
       0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1,
       1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0,
       1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0,
       1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1,
       0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1,
       1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0,
       1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0,
       0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1,
       1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1,
       1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0,
       1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=uint8)
import xgboost as xgb

dtrain = xgb.DMatrix(X_train, label=y_train)
dvalid = xgb.DMatrix(X_test, label=y_test)
dtest = xgb.DMatrix(test)

## fixed parameters
num_rounds=20 # number of boosting iterations

param = {'silent':1,
         'min_child_weight':1,
         'objective':'binary:logistic',
         'eval_metric':'auc',
         'seed' : 1234}
from collections import OrderedDict

ratio_neg_to_pos = sum(y_train==0)/sum(y_train==1)  ## = 608
print('Ratio of negative to positive instances: {:6.1f}'.format(ratio_neg_to_pos))

## parameters to be tuned
tune_dic = OrderedDict()

tune_dic['max_depth']= [5,10,15,20,25] ## maximum tree depth
tune_dic['subsample']=[0.5,0.6,0.7,0.8,0.9,1.0] ## proportion of training instances used in trees
tune_dic['colsample_bytree']= [0.5,0.6,0.7,0.8,0.9,1.0] ## subsample ratio of columns
tune_dic['eta']= [0.01,0.05,0.10,0.20,0.30,0.40]  ## learning rate
tune_dic['gamma']= [0.00,0.05,0.10,0.15,0.20]  ## minimum loss function reduction required for a split
tune_dic['scale_pos_weight']=[30,40,50,300,400,500,600,700] ## relative weight of positive/negative instances

lengths = [len(lst) for lst in tune_dic.values()]

combs=1
for i in range(len(lengths)):
    combs *= lengths[i]
print('Total number of combinations: {:16d}'.format(combs))  

maxiter=100

columns=[*tune_dic.keys()]+['F-Score','Best F-Score','auc']
results = pd.DataFrame(index=range(maxiter), columns=columns) ## dataframe to hold training results
Ratio of negative to positive instances:    3.6
Total number of combinations:            43200
def perf_measures(preds, labels, print_conf_matrix=False):
    
    act_pos=sum(labels==1) ## actual positive
    act_neg=len(labels) - act_pos ## actual negative
    
    pred_pos=sum(1 for i in range(len(preds)) if (preds[i]>=0.5)) ## predicted positive
    true_pos=sum(1 for i in range(len(preds)) if (preds[i]>=0.5) & (labels[i]==1)) ## predicted negative
    
    false_pos=pred_pos - true_pos ## false positive
    false_neg=act_pos-true_pos ## false negative
    true_neg=act_neg-false_pos ## true negative
      
    precision = true_pos/pred_pos ## tp/(tp+fp) percentage of correctly classified predicted positives
    recall = true_pos/act_pos ## tp/(tp+fn) percentage of positives correctly classified
    
    f_score = 2*precision*recall/(precision+recall) 
    
    if print_conf_matrix:
        print('\nconfusion matrix')
        print('----------------')
        print( 'tn:{:6d} fp:{:6d}'.format(true_neg,false_pos))
        print( 'fn:{:6d} tp:{:6d}'.format(false_neg,true_pos))
    
    return(f_score)


def do_train(cur_choice, param, train,train_s,trainY,valid,valid_s,validY,print_conf_matrix=False):
    ## train with given fixed and variable parameters
    ## and report the F-score on the validation dataset
    
    print('Parameters:')
    for (key,value) in cur_choice.items():
        print(key,': ',value,' ',end='')
        param[key]=value
    print('\n')    
    
##    the commented-out segment below uses a watchlist to monitor the progress of the boosting iterations 
##    evallist  = [(train,train_s), (valid,valid_s)]
##    model = xgb.train( param, train, num_boost_round=num_rounds,
##                      evals=evallist,verbose_eval=False)  
    
    model = xgb.train( param, train, num_boost_round=num_rounds)  
    
    preds = model.predict(valid)
    labels = valid.get_label()

    
      
    f_score = perf_measures(preds, labels,print_conf_matrix)
    
    return(f_score, model)
    eval_score = accuracy_score(labels, model.predict(valid))

print('Eval ACC: {}'.format(eval_score))
Eval ACC: 0.9658792650918635
def next_choice(cur_params=None):
    ## returns a random combination of the variable parameters (if cur_params=None)
    ## or a random neighboring combination from cur_params
    if cur_params:
        ## chose parameter to change
        ## parameter name and current value
        choose_param_name, cur_value = random.choice(list(cur_choice.items())) ## parameter name 
       
        all_values =  list(tune_dic[choose_param_name]) ## all values of selected parameter
        cur_index = all_values.index(cur_value) ## current index of selected parameter
        
        if cur_index==0: ## if it is the first in the range select the second one
            next_index=1
        elif cur_index==len(all_values)-1: ## if it is the last in the range select the previous one
            next_index=len(all_values)-2
        else: ## otherwise select the left or right value randomly
            direction=np.random.choice([-1,1])
            next_index=cur_index + direction

        next_params = dict((k,v) for k,v in cur_params.items())
        next_params[choose_param_name] = all_values[next_index] ## change the value of the selected parameter
        print('selected move: {:10s}: from {:6.2f} to {:6.2f}'.
              format(choose_param_name, cur_value, all_values[next_index] ))
    else: ## generate a random combination of parameters
        next_params=dict()
        for i in range(len(tune_dic)):
            key = [*tune_dic.keys()][i] 
            values = [*tune_dic.values()][i]
            next_params[key] = np.random.choice(values)
    return(next_params)
import random
random.seed(1234)
import time

t0 = time.clock()

T=0.40
best_params = dict() ## initialize dictionary to hold the best parameters

best_f_score = -1. ## initialize best f-score
prev_f_score = -1. ## initialize previous f-score
prev_choice = None ## initialize previous selection of parameters
weights = list(map(lambda x: 10**x, [0,1,2,3,4])) ## weights for the hash function
hash_values=set()

for iter in range(maxiter):
    print('\nIteration = {:5d}  T = {:12.6f}'.format(iter,T))

    ## find next selection of parameters not visited before
    while True:
        cur_choice=next_choice(prev_choice) ## first selection or selection-neighbor of prev_choice
         
        ## indices of the selections in alphabetical order of the parameters    
        indices=[tune_dic[name].index(cur_choice[name]) for name in sorted([*tune_dic.keys()])]
        
        ## check if selection has already been visited
        hash_val = sum([i*j for (i, j) in zip(weights, indices)])
        if hash_val in hash_values:
            print('\nCombination revisited - searching again')

#        tmp=abs(results.loc[:,[*cur_choice.keys()]] - list(cur_choice.values()))
#        tmp=tmp.sum(axis=1)
#        if any(tmp==0): ## selection has already been visited
#            print('\nCombination revisited - searching again')
        else:
            hash_values.add(hash_val)
            break ## break out of the while-loop
    
    
    ## train the model and obtain f-score on the validation dataset
    f_score,model=do_train(cur_choice, param, dtrain,'train',y_train,dvalid,'valid',y_test)
    
    ## store the parameters
    results.loc[iter,[*cur_choice.keys()]]=list(cur_choice.values())
    
    print('    F-Score: {:6.2f}  previous: {:6.2f}  best so far: {:6.2f}'.format(f_score, prev_f_score, best_f_score))
 
    if f_score > prev_f_score:
        print('    Local improvement')
        
        ## accept this combination as the new starting point
        prev_f_score = f_score
        prev_choice = cur_choice
        
        ## update best parameters if the f-score is globally better
        if f_score > best_f_score:
            best_f_score = f_score
            print('    Global improvement - best f-score updated')
            for (key,value) in prev_choice.items():
                best_params[key]=value

    else: ## f-score is smaller than the previous one
        
        ## accept this combination as the new starting point with probability exp(-(1.6 x f-score decline)/temperature) 
        rnd = random.random()
        diff = f_score-prev_f_score
        thres=np.exp(1.3*diff/T)
        if rnd <= thres:
            print('    Worse result. F-Score change: {:8.4f}  threshold: {:6.4f}  random number: {:6.4f} -> accepted'.
                  format(diff, thres, rnd))
            prev_f_score = f_score
            prev_choice = cur_choice
 
        else:
            ## do not update previous f-score and previous choice
            print('    Worse result. F-Score change: {:8.4f}  threshold: {:6.4f}  random number: {:6.4f} -> rejected'.
                 format(diff, thres, rnd))
    ## store results
    results.loc[iter,'F-Score']=f_score
    results.loc[iter,'Best F-Score']=best_f_score
    if iter % 5 == 0: T=0.85*T  ## reduce temperature every 5 iterations and continue 
        
print('\n{:6.1f} minutes process time\n'.format((time.clock() - t0)/60))    

print('Best variable parameters found:\n')
print(best_params)
Iteration =     0  T =     0.400000
Parameters:
max_depth :  10  subsample :  0.6  colsample_bytree :  0.9  eta :  0.1  gamma :  0.05  scale_pos_weight :  50  

    F-Score:   0.76  previous:  -1.00  best so far:  -1.00
    Local improvement
    Global improvement - best f-score updated

Iteration =     1  T =     0.340000
selected move: eta       : from   0.10 to   0.20
Parameters:
max_depth :  10  subsample :  0.6  colsample_bytree :  0.9  eta :  0.2  gamma :  0.05  scale_pos_weight :  50  

    F-Score:   0.79  previous:   0.76  best so far:   0.76
    Local improvement
    Global improvement - best f-score updated

Iteration =     2  T =     0.340000
selected move: max_depth : from  10.00 to  15.00
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.9  eta :  0.2  gamma :  0.05  scale_pos_weight :  50  

    F-Score:   0.88  previous:   0.79  best so far:   0.79
    Local improvement
    Global improvement - best f-score updated

Iteration =     3  T =     0.340000
selected move: max_depth : from  15.00 to  10.00

Combination revisited - searching again
selected move: max_depth : from  10.00 to   5.00
Parameters:
max_depth :  5  subsample :  0.6  colsample_bytree :  0.9  eta :  0.2  gamma :  0.05  scale_pos_weight :  50  

    F-Score:   0.70  previous:   0.88  best so far:   0.88
    Worse result. F-Score change:  -0.1828  threshold: 0.4971  random number: 0.9110 -> rejected

Iteration =     4  T =     0.340000
selected move: gamma     : from   0.05 to   0.00
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.9  eta :  0.2  gamma :  0.0  scale_pos_weight :  50  

    F-Score:   0.88  previous:   0.88  best so far:   0.88
    Worse result. F-Score change:   0.0000  threshold: 1.0000  random number: 0.0349 -> accepted

Iteration =     5  T =     0.340000
selected move: scale_pos_weight: from  50.00 to  40.00
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.9  eta :  0.2  gamma :  0.0  scale_pos_weight :  40  

    F-Score:   0.89  previous:   0.88  best so far:   0.88
    Local improvement
    Global improvement - best f-score updated

Iteration =     6  T =     0.289000
selected move: max_depth : from  15.00 to  20.00
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.9  eta :  0.2  gamma :  0.0  scale_pos_weight :  40  

    F-Score:   0.89  previous:   0.89  best so far:   0.89
    Local improvement
    Global improvement - best f-score updated

Iteration =     7  T =     0.289000
selected move: max_depth : from  20.00 to  15.00

Combination revisited - searching again
selected move: colsample_bytree: from   0.90 to   1.00
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  1.0  eta :  0.2  gamma :  0.0  scale_pos_weight :  40  

    F-Score:   0.88  previous:   0.89  best so far:   0.89
    Worse result. F-Score change:  -0.0113  threshold: 0.9503  random number: 0.2368 -> accepted

Iteration =     8  T =     0.289000
selected move: max_depth : from  20.00 to  15.00
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  1.0  eta :  0.2  gamma :  0.0  scale_pos_weight :  40  

    F-Score:   0.87  previous:   0.88  best so far:   0.89
    Worse result. F-Score change:  -0.0076  threshold: 0.9666  random number: 0.9867 -> rejected

Iteration =     9  T =     0.289000
selected move: max_depth : from  15.00 to  20.00

Combination revisited - searching again
selected move: colsample_bytree: from   1.00 to   0.90

Combination revisited - searching again
selected move: scale_pos_weight: from  40.00 to  50.00
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  1.0  eta :  0.2  gamma :  0.0  scale_pos_weight :  50  

    F-Score:   0.87  previous:   0.88  best so far:   0.89
    Worse result. F-Score change:  -0.0045  threshold: 0.9802  random number: 0.6233 -> accepted

Iteration =    10  T =     0.289000
selected move: gamma     : from   0.00 to   0.05
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  1.0  eta :  0.2  gamma :  0.05  scale_pos_weight :  50  

    F-Score:   0.87  previous:   0.87  best so far:   0.89
    Worse result. F-Score change:   0.0000  threshold: 1.0000  random number: 0.4635 -> accepted

Iteration =    11  T =     0.245650
selected move: max_depth : from  20.00 to  15.00
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  1.0  eta :  0.2  gamma :  0.05  scale_pos_weight :  50  

    F-Score:   0.84  previous:   0.87  best so far:   0.89
    Worse result. F-Score change:  -0.0325  threshold: 0.8419  random number: 0.1831 -> accepted

Iteration =    12  T =     0.245650
selected move: max_depth : from  15.00 to  10.00
Parameters:
max_depth :  10  subsample :  0.6  colsample_bytree :  1.0  eta :  0.2  gamma :  0.05  scale_pos_weight :  50  

    F-Score:   0.83  previous:   0.84  best so far:   0.89
    Worse result. F-Score change:  -0.0137  threshold: 0.9303  random number: 0.8453 -> accepted

Iteration =    13  T =     0.245650
selected move: gamma     : from   0.05 to   0.00
Parameters:
max_depth :  10  subsample :  0.6  colsample_bytree :  1.0  eta :  0.2  gamma :  0.0  scale_pos_weight :  50  

    F-Score:   0.83  previous:   0.83  best so far:   0.89
    Worse result. F-Score change:   0.0000  threshold: 1.0000  random number: 0.4868 -> accepted

Iteration =    14  T =     0.245650
selected move: subsample : from   0.60 to   0.50

Combination revisited - searching again
selected move: max_depth : from  10.00 to   5.00
Parameters:
max_depth :  5  subsample :  0.6  colsample_bytree :  1.0  eta :  0.2  gamma :  0.0  scale_pos_weight :  50  

    F-Score:   0.68  previous:   0.83  best so far:   0.89
    Worse result. F-Score change:  -0.1421  threshold: 0.4714  random number: 0.6673 -> rejected

Iteration =    15  T =     0.245650
selected move: eta       : from   0.20 to   0.30
Parameters:
max_depth :  10  subsample :  0.6  colsample_bytree :  1.0  eta :  0.3  gamma :  0.0  scale_pos_weight :  50  

    F-Score:   0.88  previous:   0.83  best so far:   0.89
    Local improvement

Iteration =    16  T =     0.208803
selected move: max_depth : from  10.00 to   5.00
Parameters:
max_depth :  5  subsample :  0.6  colsample_bytree :  1.0  eta :  0.3  gamma :  0.0  scale_pos_weight :  50  

    F-Score:   0.73  previous:   0.88  best so far:   0.89
    Worse result. F-Score change:  -0.1442  threshold: 0.4074  random number: 0.6015 -> rejected

Iteration =    17  T =     0.208803
selected move: max_depth : from   5.00 to  10.00

Combination revisited - searching again
selected move: gamma     : from   0.00 to   0.05
Parameters:
max_depth :  10  subsample :  0.6  colsample_bytree :  1.0  eta :  0.3  gamma :  0.05  scale_pos_weight :  50  

    F-Score:   0.88  previous:   0.88  best so far:   0.89
    Worse result. F-Score change:   0.0000  threshold: 1.0000  random number: 0.5790 -> accepted

Iteration =    18  T =     0.208803
selected move: colsample_bytree: from   1.00 to   0.90
Parameters:
max_depth :  10  subsample :  0.6  colsample_bytree :  0.9  eta :  0.3  gamma :  0.05  scale_pos_weight :  50  

    F-Score:   0.88  previous:   0.88  best so far:   0.89
    Local improvement

Iteration =    19  T =     0.208803
selected move: gamma     : from   0.05 to   0.00
Parameters:
max_depth :  10  subsample :  0.6  colsample_bytree :  0.9  eta :  0.3  gamma :  0.0  scale_pos_weight :  50  

    F-Score:   0.87  previous:   0.88  best so far:   0.89
    Worse result. F-Score change:  -0.0087  threshold: 0.9472  random number: 0.0631 -> accepted

Iteration =    20  T =     0.208803
selected move: colsample_bytree: from   0.90 to   0.80
Parameters:
max_depth :  10  subsample :  0.6  colsample_bytree :  0.8  eta :  0.3  gamma :  0.0  scale_pos_weight :  50  

    F-Score:   0.84  previous:   0.87  best so far:   0.89
    Worse result. F-Score change:  -0.0281  threshold: 0.8392  random number: 0.4810 -> accepted

Iteration =    21  T =     0.177482
selected move: colsample_bytree: from   0.80 to   0.90

Combination revisited - searching again
selected move: colsample_bytree: from   0.90 to   1.00

Combination revisited - searching again
selected move: subsample : from   0.60 to   0.70

Combination revisited - searching again
selected move: scale_pos_weight: from  50.00 to  40.00
Parameters:
max_depth :  10  subsample :  0.6  colsample_bytree :  0.8  eta :  0.3  gamma :  0.0  scale_pos_weight :  40  

    F-Score:   0.88  previous:   0.84  best so far:   0.89
    Local improvement

Iteration =    22  T =     0.177482
selected move: subsample : from   0.60 to   0.70

Combination revisited - searching again
selected move: eta       : from   0.30 to   0.20
Parameters:
max_depth :  10  subsample :  0.6  colsample_bytree :  0.8  eta :  0.2  gamma :  0.0  scale_pos_weight :  40  

    F-Score:   0.83  previous:   0.88  best so far:   0.89
    Worse result. F-Score change:  -0.0479  threshold: 0.7039  random number: 0.8766 -> rejected

Iteration =    23  T =     0.177482
selected move: scale_pos_weight: from  40.00 to  30.00
Parameters:
max_depth :  10  subsample :  0.6  colsample_bytree :  0.8  eta :  0.3  gamma :  0.0  scale_pos_weight :  30  

    F-Score:   0.89  previous:   0.88  best so far:   0.89
    Local improvement
    Global improvement - best f-score updated

Iteration =    24  T =     0.177482
selected move: max_depth : from  10.00 to  15.00
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.8  eta :  0.3  gamma :  0.0  scale_pos_weight :  30  

    F-Score:   0.92  previous:   0.89  best so far:   0.89
    Local improvement
    Global improvement - best f-score updated

Iteration =    25  T =     0.177482
selected move: gamma     : from   0.00 to   0.05
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.8  eta :  0.3  gamma :  0.05  scale_pos_weight :  30  

    F-Score:   0.90  previous:   0.92  best so far:   0.92
    Worse result. F-Score change:  -0.0153  threshold: 0.8942  random number: 0.0706 -> accepted

Iteration =    26  T =     0.150860
selected move: colsample_bytree: from   0.80 to   0.70
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.7  eta :  0.3  gamma :  0.05  scale_pos_weight :  30  

    F-Score:   0.92  previous:   0.90  best so far:   0.92
    Local improvement
    Global improvement - best f-score updated

Iteration =    27  T =     0.150860
selected move: subsample : from   0.60 to   0.50

Combination revisited - searching again
selected move: scale_pos_weight: from  30.00 to  40.00
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.7  eta :  0.3  gamma :  0.05  scale_pos_weight :  40  

    F-Score:   0.90  previous:   0.92  best so far:   0.92
    Worse result. F-Score change:  -0.0199  threshold: 0.8423  random number: 0.4728 -> accepted

Iteration =    28  T =     0.150860
selected move: eta       : from   0.30 to   0.40
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.7  eta :  0.4  gamma :  0.05  scale_pos_weight :  40  

    F-Score:   0.92  previous:   0.90  best so far:   0.92
    Local improvement

Iteration =    29  T =     0.150860
selected move: subsample : from   0.60 to   0.50

Combination revisited - searching again
selected move: max_depth : from  15.00 to  10.00
Parameters:
max_depth :  10  subsample :  0.6  colsample_bytree :  0.7  eta :  0.4  gamma :  0.05  scale_pos_weight :  40  

    F-Score:   0.89  previous:   0.92  best so far:   0.92
    Worse result. F-Score change:  -0.0303  threshold: 0.7704  random number: 0.0762 -> accepted

Iteration =    30  T =     0.150860
selected move: eta       : from   0.40 to   0.30
Parameters:
max_depth :  10  subsample :  0.6  colsample_bytree :  0.7  eta :  0.3  gamma :  0.05  scale_pos_weight :  40  

    F-Score:   0.87  previous:   0.89  best so far:   0.92
    Worse result. F-Score change:  -0.0212  threshold: 0.8331  random number: 0.1640 -> accepted

Iteration =    31  T =     0.128231
selected move: gamma     : from   0.05 to   0.00
Parameters:
max_depth :  10  subsample :  0.6  colsample_bytree :  0.7  eta :  0.3  gamma :  0.0  scale_pos_weight :  40  

    F-Score:   0.87  previous:   0.87  best so far:   0.92
    Worse result. F-Score change:   0.0000  threshold: 1.0000  random number: 0.2151 -> accepted

Iteration =    32  T =     0.128231
selected move: max_depth : from  10.00 to  15.00
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.7  eta :  0.3  gamma :  0.0  scale_pos_weight :  40  

    F-Score:   0.90  previous:   0.87  best so far:   0.92
    Local improvement

Iteration =    33  T =     0.128231
selected move: colsample_bytree: from   0.70 to   0.80
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.8  eta :  0.3  gamma :  0.0  scale_pos_weight :  40  

    F-Score:   0.91  previous:   0.90  best so far:   0.92
    Local improvement

Iteration =    34  T =     0.128231
selected move: max_depth : from  15.00 to  20.00
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.8  eta :  0.3  gamma :  0.0  scale_pos_weight :  40  

    F-Score:   0.90  previous:   0.91  best so far:   0.92
    Worse result. F-Score change:  -0.0152  threshold: 0.8575  random number: 0.9569 -> rejected

Iteration =    35  T =     0.128231
selected move: subsample : from   0.60 to   0.70

Combination revisited - searching again
selected move: scale_pos_weight: from  40.00 to  30.00

Combination revisited - searching again
selected move: colsample_bytree: from   0.80 to   0.70

Combination revisited - searching again
selected move: eta       : from   0.30 to   0.20
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.8  eta :  0.2  gamma :  0.0  scale_pos_weight :  40  

    F-Score:   0.86  previous:   0.91  best so far:   0.92
    Worse result. F-Score change:  -0.0539  threshold: 0.5793  random number: 0.0396 -> accepted

Iteration =    36  T =     0.108996
selected move: colsample_bytree: from   0.80 to   0.70
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.7  eta :  0.2  gamma :  0.0  scale_pos_weight :  40  

    F-Score:   0.86  previous:   0.86  best so far:   0.92
    Local improvement

Iteration =    37  T =     0.108996
selected move: scale_pos_weight: from  40.00 to  30.00
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.7  eta :  0.2  gamma :  0.0  scale_pos_weight :  30  

    F-Score:   0.89  previous:   0.86  best so far:   0.92
    Local improvement

Iteration =    38  T =     0.108996
selected move: max_depth : from  15.00 to  10.00
Parameters:
max_depth :  10  subsample :  0.6  colsample_bytree :  0.7  eta :  0.2  gamma :  0.0  scale_pos_weight :  30  

    F-Score:   0.82  previous:   0.89  best so far:   0.92
    Worse result. F-Score change:  -0.0631  threshold: 0.4711  random number: 0.1773 -> accepted

Iteration =    39  T =     0.108996
selected move: subsample : from   0.60 to   0.50

Combination revisited - searching again
selected move: colsample_bytree: from   0.70 to   0.80
Parameters:
max_depth :  10  subsample :  0.6  colsample_bytree :  0.8  eta :  0.2  gamma :  0.0  scale_pos_weight :  30  

    F-Score:   0.84  previous:   0.82  best so far:   0.92
    Local improvement

Iteration =    40  T =     0.108996
selected move: subsample : from   0.60 to   0.50

Combination revisited - searching again
selected move: eta       : from   0.20 to   0.30

Combination revisited - searching again
selected move: max_depth : from  10.00 to  15.00
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.8  eta :  0.2  gamma :  0.0  scale_pos_weight :  30  

    F-Score:   0.90  previous:   0.84  best so far:   0.92
    Local improvement

Iteration =    41  T =     0.092647
selected move: gamma     : from   0.00 to   0.05
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.8  eta :  0.2  gamma :  0.05  scale_pos_weight :  30  

    F-Score:   0.89  previous:   0.90  best so far:   0.92
    Worse result. F-Score change:  -0.0127  threshold: 0.8372  random number: 0.3862 -> accepted

Iteration =    42  T =     0.092647
selected move: colsample_bytree: from   0.80 to   0.90
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.9  eta :  0.2  gamma :  0.05  scale_pos_weight :  30  

    F-Score:   0.87  previous:   0.89  best so far:   0.92
    Worse result. F-Score change:  -0.0189  threshold: 0.7672  random number: 0.4959 -> accepted

Iteration =    43  T =     0.092647
selected move: gamma     : from   0.05 to   0.00
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.9  eta :  0.2  gamma :  0.0  scale_pos_weight :  30  

    F-Score:   0.88  previous:   0.87  best so far:   0.92
    Local improvement

Iteration =    44  T =     0.092647
selected move: eta       : from   0.20 to   0.30
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.9  eta :  0.3  gamma :  0.0  scale_pos_weight :  30  

    F-Score:   0.93  previous:   0.88  best so far:   0.92
    Local improvement
    Global improvement - best f-score updated

Iteration =    45  T =     0.092647
selected move: eta       : from   0.30 to   0.40
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.9  eta :  0.4  gamma :  0.0  scale_pos_weight :  30  

    F-Score:   0.94  previous:   0.93  best so far:   0.93
    Local improvement
    Global improvement - best f-score updated

Iteration =    46  T =     0.078750
selected move: max_depth : from  15.00 to  20.00
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.9  eta :  0.4  gamma :  0.0  scale_pos_weight :  30  

    F-Score:   0.94  previous:   0.94  best so far:   0.94
    Local improvement
    Global improvement - best f-score updated

Iteration =    47  T =     0.078750
selected move: eta       : from   0.40 to   0.30
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.9  eta :  0.3  gamma :  0.0  scale_pos_weight :  30  

    F-Score:   0.91  previous:   0.94  best so far:   0.94
    Worse result. F-Score change:  -0.0305  threshold: 0.6048  random number: 0.8702 -> rejected

Iteration =    48  T =     0.078750
selected move: gamma     : from   0.00 to   0.05
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.9  eta :  0.4  gamma :  0.05  scale_pos_weight :  30  

    F-Score:   0.91  previous:   0.94  best so far:   0.94
    Worse result. F-Score change:  -0.0384  threshold: 0.5308  random number: 0.8816 -> rejected

Iteration =    49  T =     0.078750
selected move: eta       : from   0.40 to   0.30

Combination revisited - searching again
selected move: scale_pos_weight: from  30.00 to  40.00
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.9  eta :  0.4  gamma :  0.0  scale_pos_weight :  40  

    F-Score:   0.93  previous:   0.94  best so far:   0.94
    Worse result. F-Score change:  -0.0140  threshold: 0.7941  random number: 0.7811 -> accepted

Iteration =    50  T =     0.078750
selected move: eta       : from   0.40 to   0.30
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.9  eta :  0.3  gamma :  0.0  scale_pos_weight :  40  

    F-Score:   0.89  previous:   0.93  best so far:   0.94
    Worse result. F-Score change:  -0.0416  threshold: 0.5033  random number: 0.2957 -> accepted

Iteration =    51  T =     0.066937
selected move: max_depth : from  20.00 to  25.00
Parameters:
max_depth :  25  subsample :  0.6  colsample_bytree :  0.9  eta :  0.3  gamma :  0.0  scale_pos_weight :  40  

    F-Score:   0.89  previous:   0.89  best so far:   0.94
    Worse result. F-Score change:   0.0000  threshold: 1.0000  random number: 0.7777 -> accepted

Iteration =    52  T =     0.066937
selected move: eta       : from   0.30 to   0.40
Parameters:
max_depth :  25  subsample :  0.6  colsample_bytree :  0.9  eta :  0.4  gamma :  0.0  scale_pos_weight :  40  

    F-Score:   0.93  previous:   0.89  best so far:   0.94
    Local improvement

Iteration =    53  T =     0.066937
selected move: eta       : from   0.40 to   0.30

Combination revisited - searching again
selected move: max_depth : from  25.00 to  20.00

Combination revisited - searching again
selected move: gamma     : from   0.00 to   0.05
Parameters:
max_depth :  25  subsample :  0.6  colsample_bytree :  0.9  eta :  0.4  gamma :  0.05  scale_pos_weight :  40  

    F-Score:   0.93  previous:   0.93  best so far:   0.94
    Local improvement

Iteration =    54  T =     0.066937
selected move: max_depth : from  25.00 to  20.00
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.9  eta :  0.4  gamma :  0.05  scale_pos_weight :  40  

    F-Score:   0.92  previous:   0.93  best so far:   0.94
    Worse result. F-Score change:  -0.0159  threshold: 0.7348  random number: 0.8294 -> rejected

Iteration =    55  T =     0.066937
selected move: eta       : from   0.40 to   0.30
Parameters:
max_depth :  25  subsample :  0.6  colsample_bytree :  0.9  eta :  0.3  gamma :  0.05  scale_pos_weight :  40  

    F-Score:   0.89  previous:   0.93  best so far:   0.94
    Worse result. F-Score change:  -0.0447  threshold: 0.4195  random number: 0.3768 -> accepted

Iteration =    56  T =     0.056897
selected move: scale_pos_weight: from  40.00 to  50.00
Parameters:
max_depth :  25  subsample :  0.6  colsample_bytree :  0.9  eta :  0.3  gamma :  0.05  scale_pos_weight :  50  

    F-Score:   0.90  previous:   0.89  best so far:   0.94
    Local improvement

Iteration =    57  T =     0.056897
selected move: colsample_bytree: from   0.90 to   0.80
Parameters:
max_depth :  25  subsample :  0.6  colsample_bytree :  0.8  eta :  0.3  gamma :  0.05  scale_pos_weight :  50  

    F-Score:   0.90  previous:   0.90  best so far:   0.94
    Local improvement

Iteration =    58  T =     0.056897
selected move: scale_pos_weight: from  50.00 to  40.00
Parameters:
max_depth :  25  subsample :  0.6  colsample_bytree :  0.8  eta :  0.3  gamma :  0.05  scale_pos_weight :  40  

    F-Score:   0.91  previous:   0.90  best so far:   0.94
    Local improvement

Iteration =    59  T =     0.056897
selected move: subsample : from   0.60 to   0.70

Combination revisited - searching again
selected move: colsample_bytree: from   0.80 to   0.90

Combination revisited - searching again
selected move: max_depth : from  25.00 to  20.00
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.8  eta :  0.3  gamma :  0.05  scale_pos_weight :  40  

    F-Score:   0.91  previous:   0.91  best so far:   0.94
    Worse result. F-Score change:   0.0000  threshold: 1.0000  random number: 0.7027 -> accepted

Iteration =    60  T =     0.056897
selected move: subsample : from   0.60 to   0.70

Combination revisited - searching again
selected move: gamma     : from   0.05 to   0.00

Combination revisited - searching again
selected move: colsample_bytree: from   0.80 to   0.70
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.7  eta :  0.3  gamma :  0.05  scale_pos_weight :  40  

    F-Score:   0.91  previous:   0.91  best so far:   0.94
    Local improvement

Iteration =    61  T =     0.048362
selected move: subsample : from   0.60 to   0.50

Combination revisited - searching again
selected move: scale_pos_weight: from  40.00 to  50.00
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.7  eta :  0.3  gamma :  0.05  scale_pos_weight :  50  

    F-Score:   0.90  previous:   0.91  best so far:   0.94
    Worse result. F-Score change:  -0.0130  threshold: 0.7044  random number: 0.3235 -> accepted

Iteration =    62  T =     0.048362
selected move: eta       : from   0.30 to   0.40
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.7  eta :  0.4  gamma :  0.05  scale_pos_weight :  50  

    F-Score:   0.92  previous:   0.90  best so far:   0.94
    Local improvement

Iteration =    63  T =     0.048362
selected move: scale_pos_weight: from  50.00 to 300.00
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.7  eta :  0.4  gamma :  0.05  scale_pos_weight :  300  

    F-Score:   0.89  previous:   0.92  best so far:   0.94
    Worse result. F-Score change:  -0.0269  threshold: 0.4852  random number: 0.0708 -> accepted

Iteration =    64  T =     0.048362
selected move: eta       : from   0.40 to   0.30
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.7  eta :  0.3  gamma :  0.05  scale_pos_weight :  300  

    F-Score:   0.83  previous:   0.89  best so far:   0.94
    Worse result. F-Score change:  -0.0564  threshold: 0.2199  random number: 0.2141 -> accepted

Iteration =    65  T =     0.048362
selected move: colsample_bytree: from   0.70 to   0.60
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.6  eta :  0.3  gamma :  0.05  scale_pos_weight :  300  

    F-Score:   0.83  previous:   0.83  best so far:   0.94
    Worse result. F-Score change:  -0.0062  threshold: 0.8455  random number: 0.8034 -> accepted

Iteration =    66  T =     0.041108
selected move: subsample : from   0.60 to   0.50

Combination revisited - searching again
selected move: gamma     : from   0.05 to   0.10
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.6  eta :  0.3  gamma :  0.1  scale_pos_weight :  300  

    F-Score:   0.84  previous:   0.83  best so far:   0.94
    Local improvement

Iteration =    67  T =     0.041108
selected move: max_depth : from  20.00 to  25.00
Parameters:
max_depth :  25  subsample :  0.6  colsample_bytree :  0.6  eta :  0.3  gamma :  0.1  scale_pos_weight :  300  

    F-Score:   0.84  previous:   0.84  best so far:   0.94
    Worse result. F-Score change:   0.0000  threshold: 1.0000  random number: 0.7928 -> accepted

Iteration =    68  T =     0.041108
selected move: scale_pos_weight: from 300.00 to  50.00
Parameters:
max_depth :  25  subsample :  0.6  colsample_bytree :  0.6  eta :  0.3  gamma :  0.1  scale_pos_weight :  50  

    F-Score:   0.91  previous:   0.84  best so far:   0.94
    Local improvement

Iteration =    69  T =     0.041108
selected move: colsample_bytree: from   0.60 to   0.50
Parameters:
max_depth :  25  subsample :  0.6  colsample_bytree :  0.5  eta :  0.3  gamma :  0.1  scale_pos_weight :  50  

    F-Score:   0.90  previous:   0.91  best so far:   0.94
    Worse result. F-Score change:  -0.0093  threshold: 0.7462  random number: 0.3056 -> accepted

Iteration =    70  T =     0.041108
selected move: eta       : from   0.30 to   0.40
Parameters:
max_depth :  25  subsample :  0.6  colsample_bytree :  0.5  eta :  0.4  gamma :  0.1  scale_pos_weight :  50  

    F-Score:   0.90  previous:   0.90  best so far:   0.94
    Worse result. F-Score change:   0.0000  threshold: 1.0000  random number: 0.0662 -> accepted

Iteration =    71  T =     0.034942
selected move: colsample_bytree: from   0.50 to   0.60
Parameters:
max_depth :  25  subsample :  0.6  colsample_bytree :  0.6  eta :  0.4  gamma :  0.1  scale_pos_weight :  50  

    F-Score:   0.93  previous:   0.90  best so far:   0.94
    Local improvement

Iteration =    72  T =     0.034942
selected move: max_depth : from  25.00 to  20.00
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.6  eta :  0.4  gamma :  0.1  scale_pos_weight :  50  

    F-Score:   0.93  previous:   0.93  best so far:   0.94
    Worse result. F-Score change:   0.0000  threshold: 1.0000  random number: 0.6888 -> accepted

Iteration =    73  T =     0.034942
selected move: subsample : from   0.60 to   0.50

Combination revisited - searching again
selected move: max_depth : from  20.00 to  25.00

Combination revisited - searching again
selected move: eta       : from   0.40 to   0.30
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.6  eta :  0.3  gamma :  0.1  scale_pos_weight :  50  

    F-Score:   0.89  previous:   0.93  best so far:   0.94
    Worse result. F-Score change:  -0.0423  threshold: 0.2070  random number: 0.1946 -> accepted

Iteration =    74  T =     0.034942
selected move: eta       : from   0.30 to   0.20
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.6  eta :  0.2  gamma :  0.1  scale_pos_weight :  50  

    F-Score:   0.86  previous:   0.89  best so far:   0.94
    Worse result. F-Score change:  -0.0261  threshold: 0.3781  random number: 0.5005 -> rejected

Iteration =    75  T =     0.034942
selected move: colsample_bytree: from   0.60 to   0.70
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.7  eta :  0.3  gamma :  0.1  scale_pos_weight :  50  

    F-Score:   0.89  previous:   0.89  best so far:   0.94
    Local improvement

Iteration =    76  T =     0.029700
selected move: subsample : from   0.60 to   0.70

Combination revisited - searching again
selected move: gamma     : from   0.10 to   0.15
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.7  eta :  0.3  gamma :  0.15  scale_pos_weight :  50  

    F-Score:   0.90  previous:   0.89  best so far:   0.94
    Local improvement

Iteration =    77  T =     0.029700
selected move: scale_pos_weight: from  50.00 to 300.00
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.7  eta :  0.3  gamma :  0.15  scale_pos_weight :  300  

    F-Score:   0.83  previous:   0.90  best so far:   0.94
    Worse result. F-Score change:  -0.0700  threshold: 0.0466  random number: 0.4212 -> rejected

Iteration =    78  T =     0.029700
selected move: gamma     : from   0.15 to   0.20
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.7  eta :  0.3  gamma :  0.2  scale_pos_weight :  50  

    F-Score:   0.90  previous:   0.90  best so far:   0.94
    Worse result. F-Score change:  -0.0046  threshold: 0.8175  random number: 0.2033 -> accepted

Iteration =    79  T =     0.029700
selected move: max_depth : from  20.00 to  15.00
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.7  eta :  0.3  gamma :  0.2  scale_pos_weight :  50  

    F-Score:   0.89  previous:   0.90  best so far:   0.94
    Worse result. F-Score change:  -0.0046  threshold: 0.8191  random number: 0.8129 -> accepted

Iteration =    80  T =     0.029700
selected move: gamma     : from   0.20 to   0.15
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.7  eta :  0.3  gamma :  0.15  scale_pos_weight :  50  

    F-Score:   0.87  previous:   0.89  best so far:   0.94
    Worse result. F-Score change:  -0.0234  threshold: 0.3591  random number: 0.1531 -> accepted

Iteration =    81  T =     0.025245
selected move: gamma     : from   0.15 to   0.10
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.7  eta :  0.3  gamma :  0.1  scale_pos_weight :  50  

    F-Score:   0.90  previous:   0.87  best so far:   0.94
    Local improvement

Iteration =    82  T =     0.025245
selected move: eta       : from   0.30 to   0.40
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.7  eta :  0.4  gamma :  0.1  scale_pos_weight :  50  

    F-Score:   0.91  previous:   0.90  best so far:   0.94
    Local improvement

Iteration =    83  T =     0.025245
selected move: colsample_bytree: from   0.70 to   0.60
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.6  eta :  0.4  gamma :  0.1  scale_pos_weight :  50  

    F-Score:   0.94  previous:   0.91  best so far:   0.94
    Local improvement

Iteration =    84  T =     0.025245
selected move: subsample : from   0.60 to   0.50

Combination revisited - searching again
selected move: subsample : from   0.50 to   0.60

Combination revisited - searching again
selected move: max_depth : from  15.00 to  10.00
Parameters:
max_depth :  10  subsample :  0.6  colsample_bytree :  0.6  eta :  0.4  gamma :  0.1  scale_pos_weight :  50  

    F-Score:   0.88  previous:   0.94  best so far:   0.94
    Worse result. F-Score change:  -0.0579  threshold: 0.0506  random number: 0.6245 -> rejected

Iteration =    85  T =     0.025245
selected move: subsample : from   0.60 to   0.70

Combination revisited - searching again
selected move: gamma     : from   0.10 to   0.15
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.6  eta :  0.4  gamma :  0.15  scale_pos_weight :  50  

    F-Score:   0.91  previous:   0.94  best so far:   0.94
    Worse result. F-Score change:  -0.0311  threshold: 0.2014  random number: 0.0697 -> accepted

Iteration =    86  T =     0.021459
selected move: colsample_bytree: from   0.60 to   0.70
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.7  eta :  0.4  gamma :  0.15  scale_pos_weight :  50  

    F-Score:   0.91  previous:   0.91  best so far:   0.94
    Worse result. F-Score change:  -0.0038  threshold: 0.7942  random number: 0.1710 -> accepted

Iteration =    87  T =     0.021459
selected move: scale_pos_weight: from  50.00 to  40.00
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.7  eta :  0.4  gamma :  0.15  scale_pos_weight :  40  

    F-Score:   0.92  previous:   0.91  best so far:   0.94
    Local improvement

Iteration =    88  T =     0.021459
selected move: scale_pos_weight: from  40.00 to  30.00
Parameters:
max_depth :  15  subsample :  0.6  colsample_bytree :  0.7  eta :  0.4  gamma :  0.15  scale_pos_weight :  30  

    F-Score:   0.96  previous:   0.92  best so far:   0.94
    Local improvement
    Global improvement - best f-score updated

Iteration =    89  T =     0.021459
selected move: max_depth : from  15.00 to  20.00
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.7  eta :  0.4  gamma :  0.15  scale_pos_weight :  30  

    F-Score:   0.96  previous:   0.96  best so far:   0.96
    Worse result. F-Score change:  -0.0053  threshold: 0.7262  random number: 0.6383 -> accepted

Iteration =    90  T =     0.021459
selected move: subsample : from   0.60 to   0.70

Combination revisited - searching again
selected move: subsample : from   0.70 to   0.60

Combination revisited - searching again
selected move: max_depth : from  20.00 to  15.00

Combination revisited - searching again
selected move: eta       : from   0.40 to   0.30
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.7  eta :  0.3  gamma :  0.15  scale_pos_weight :  30  

    F-Score:   0.91  previous:   0.96  best so far:   0.96
    Worse result. F-Score change:  -0.0498  threshold: 0.0490  random number: 0.3914 -> rejected

Iteration =    91  T =     0.018240
selected move: colsample_bytree: from   0.70 to   0.80
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.8  eta :  0.4  gamma :  0.15  scale_pos_weight :  30  

    F-Score:   0.94  previous:   0.96  best so far:   0.96
    Worse result. F-Score change:  -0.0162  threshold: 0.3162  random number: 0.2071 -> accepted

Iteration =    92  T =     0.018240
selected move: scale_pos_weight: from  30.00 to  40.00
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.8  eta :  0.4  gamma :  0.15  scale_pos_weight :  40  

    F-Score:   0.89  previous:   0.94  best so far:   0.96
    Worse result. F-Score change:  -0.0474  threshold: 0.0341  random number: 0.3812 -> rejected

Iteration =    93  T =     0.018240
selected move: colsample_bytree: from   0.80 to   0.70

Combination revisited - searching again
selected move: gamma     : from   0.15 to   0.10
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.8  eta :  0.4  gamma :  0.1  scale_pos_weight :  30  

    F-Score:   0.93  previous:   0.94  best so far:   0.96
    Worse result. F-Score change:  -0.0058  threshold: 0.6602  random number: 0.6318 -> accepted

Iteration =    94  T =     0.018240
selected move: colsample_bytree: from   0.80 to   0.70
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.7  eta :  0.4  gamma :  0.1  scale_pos_weight :  30  

    F-Score:   0.95  previous:   0.93  best so far:   0.96
    Local improvement

Iteration =    95  T =     0.018240
selected move: eta       : from   0.40 to   0.30
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.7  eta :  0.3  gamma :  0.1  scale_pos_weight :  30  

    F-Score:   0.92  previous:   0.95  best so far:   0.96
    Worse result. F-Score change:  -0.0347  threshold: 0.0844  random number: 0.3190 -> rejected

Iteration =    96  T =     0.015504
selected move: gamma     : from   0.10 to   0.05
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.7  eta :  0.4  gamma :  0.05  scale_pos_weight :  30  

    F-Score:   0.96  previous:   0.95  best so far:   0.96
    Local improvement
    Global improvement - best f-score updated

Iteration =    97  T =     0.015504
selected move: subsample : from   0.60 to   0.50

Combination revisited - searching again
selected move: gamma     : from   0.05 to   0.10

Combination revisited - searching again
selected move: max_depth : from  20.00 to  25.00
Parameters:
max_depth :  25  subsample :  0.6  colsample_bytree :  0.7  eta :  0.4  gamma :  0.05  scale_pos_weight :  30  

    F-Score:   0.95  previous:   0.96  best so far:   0.96
    Worse result. F-Score change:  -0.0104  threshold: 0.4182  random number: 0.7643 -> rejected

Iteration =    98  T =     0.015504
selected move: colsample_bytree: from   0.70 to   0.60
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.6  eta :  0.4  gamma :  0.05  scale_pos_weight :  30  

    F-Score:   0.92  previous:   0.96  best so far:   0.96
    Worse result. F-Score change:  -0.0428  threshold: 0.0276  random number: 0.4103 -> rejected

Iteration =    99  T =     0.015504
selected move: gamma     : from   0.05 to   0.10

Combination revisited - searching again
selected move: colsample_bytree: from   0.70 to   0.60

Combination revisited - searching again
selected move: gamma     : from   0.05 to   0.10

Combination revisited - searching again
selected move: max_depth : from  20.00 to  25.00

Combination revisited - searching again
selected move: scale_pos_weight: from  30.00 to  40.00
Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.7  eta :  0.4  gamma :  0.05  scale_pos_weight :  40  

    F-Score:   0.92  previous:   0.96  best so far:   0.96
    Worse result. F-Score change:  -0.0411  threshold: 0.0318  random number: 0.3753 -> rejected

   0.2 minutes process time

Best variable parameters found:

{'max_depth': 20, 'subsample': 0.6, 'colsample_bytree': 0.7, 'eta': 0.4, 'gamma': 0.05, 'scale_pos_weight': 30}
from pylab import rcParams
rcParams['figure.figsize'] = 15, 10
print('\nBest parameters found:\n')  
print(best_params)

print('\nEvaluation on the test dataset\n')  

best_f_score,best_model=do_train(best_params, param, dtrain,'train',y_train,dvalid,'valid',y_test,print_conf_matrix=True)


print('\nF-score on the test dataset: {:6.2f}'.format(best_f_score))


f, (ax1,ax2) = plt.subplots(nrows=1, ncols=2, sharey=False, figsize=(8,5))
ax1.plot(results['F-Score'])
ax2.plot(results['Best F-Score'])
ax1.set_xlabel('Iterations',fontsize=11)
ax2.set_xlabel('Iterations',fontsize=11)
ax1.set_ylabel('F-Score',fontsize=11)
ax2.set_ylabel('Best F-Score',fontsize=11)
ax1.set_ylim([0.7,0.9])
ax2.set_ylim([0.7,0.9])
plt.tight_layout()
plt.show()


print('\nVariables importance:\n')  

p = xgb.plot_importance(best_model) 
plt.show()
Best parameters found:

{'max_depth': 20, 'subsample': 0.6, 'colsample_bytree': 0.7, 'eta': 0.4, 'gamma': 0.05, 'scale_pos_weight': 30}

Evaluation on the test dataset

Parameters:
max_depth :  20  subsample :  0.6  colsample_bytree :  0.7  eta :  0.4  gamma :  0.05  scale_pos_weight :  30  


confusion matrix
----------------
tn:   286 fp:     6
fn:     1 tp:    88

F-score on the test dataset:   0.96
Variables importance:

Download notebook

(2 downloads)

Post categories:

target

chi square