Lab 5: Wide and Deep Networks

Prince Ndhlovu & Kirby Cravens

1. Data Preparation

import pandas as pd
import numpy as np
import warnings
%matplotlib inline
warnings.filterwarnings("ignore")

data_file = "/Users/princendhlovu/Downloads/dataset-of-10s.csv"

RawData = pd.read_csv(data_file)
RawData.head(5)

# drop the track, artist and uri columns
myData = RawData.drop(columns=['track','artist','uri'])
myData.head(5)

myData.describe()

# create a data description table
data_des = pd.DataFrame()

data_des['Features'] = myData.columns
data_des['Descriptions']= ['How suitable a track is for dancing ',
                           'A perceptual measure of intensity and activity',
                           'The estimated overall key of the track',
                           'The overall loudness of a track in decibels',
                           'The modality (major or minor) of a track',
                           'The presence of spoken words in a track',
                           'Whether the track is acoustic',
                           'Predicts whether a track contains no vocals',
                           'The presence of an audience in the recording',
                           'Musical positiveness conveyed by a track',
                           'Beats per minute',
                           'The duration of the track in milliseconds',
                           'An estimated overall time signature of a track',
                           'Timestamp the third section of the track',
                           'The number of sections the particular track has',
                           'The target variable for the track']
data_des['Scales']= ['ratio','ratio','ordinal','ratio','nominal','ratio','ratio','ratio','ratio',
                     'ratio','ratio','ratio','ratio','ratio','ratio','nominal']
data_des['Discrete/Continuous'] = ['Continuous','Continuous','Discrete','Continuous','Discrete',
                                   'Continuous','Continuous','Continuous','Continuous','Continuous',
                                   'Continuous','Discrete','Discrete','Continuous','Discrete',
                                   'Discrete']
data_des['Range'] = ['0.062200-0.981000','0.000251-0.999000','0:C, 1:C#, 2:D, 3:Eb, 4:E, 5:F etc','-46.655000--0.149000','0 (Minor) and 1 (Major)',
                     '0.022500-0.956000','0-0.996000','0-0.995000','0.016700-0.982000','0-0.976000',
                     '39.369000-210.977000','29853-1734201','0-5','0-213.154990','2-88',
                     '0:flop, 1:hit']
data_des

# find data type
print(myData.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6398 entries, 0 to 6397
Data columns (total 16 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   danceability      6398 non-null   float64
 1   energy            6398 non-null   float64
 2   key               6398 non-null   int64  
 3   loudness          6398 non-null   float64
 4   mode              6398 non-null   int64  
 5   speechiness       6398 non-null   float64
 6   acousticness      6398 non-null   float64
 7   instrumentalness  6398 non-null   float64
 8   liveness          6398 non-null   float64
 9   valence           6398 non-null   float64
 10  tempo             6398 non-null   float64
 11  duration_ms       6398 non-null   int64  
 12  time_signature    6398 non-null   int64  
 13  chorus_hit        6398 non-null   float64
 14  sections          6398 non-null   int64  
 15  target            6398 non-null   int64  
dtypes: float64(10), int64(6)
memory usage: 799.9 KB
None

There are no missing values, so we are going to check for duplicates.

#Find the duplicate instances 
index = myData.duplicated()

# find the number of duplicates
len(myData[index])

139

Since there are 139 duplicates, we are going to drop them to improve our data quality since they could have been added due to human error

myData = myData.drop_duplicates()
idx = myData.duplicated()
len(myData[idx])

0

We want to make those columns in array to_bin categorical. We want to be able to group songs by the their values in these columns. So, we are going to make the columns have a value of 1-10, so that it is easier to cross product each of them

to_bin = ['danceability','energy','speechiness','acousticness','instrumentalness','liveness','valence']
for idx,col in enumerate(to_bin):
    myData[col] = np.digitize(myData[col],bins=[0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1])

myData.describe()

Normalizing features in the array to_norm

from sklearn.preprocessing import StandardScaler

to_norm = ['loudness','tempo','duration_ms','chorus_hit','sections']

def normalize(df):
    result = df.copy()
    for feature in df.columns:
        max_val = df[feature].max()
        min_val = df[feature].min()
        result[feature] = (df[feature] - min_val)/(max_val - min_val) - 0.5
    return result

X = myData.copy()
X = X.drop(columns='target')
X[to_norm] = normalize(myData[to_norm]).astype(np.float32)

y = myData.target.astype(np.int)

X.head(10)

X['danceability'].unique()

array([ 8,  5,  6,  9,  2,  4,  7,  3, 10,  1])

X['energy'].unique()

array([ 7,  3,  5,  9, 10,  6,  8,  4,  1,  2])

X['acousticness'].unique()

array([ 1,  9,  2,  5,  4,  8,  7, 10,  6,  3])

X['instrumentalness'].unique()

array([ 1,  9, 10,  3,  7,  6,  8,  2,  4,  5])

X['valence'].unique()

array([ 8,  3,  5,  4,  2,  1,  6, 10,  7,  9])

X['speechiness'].unique()

array([ 1,  3,  2,  4,  7,  5,  8,  6, 10,  9])

X['liveness'].unique()

array([ 1,  2,  3,  4,  5, 10,  6,  8,  9,  7])

y.unique()

array([1, 0])

1.2 Cross Product Features

key, energy, valance: key measures the pitch of the track and often has to do with how upbeat, or how much energy, it has. Valence describes the musical positiveness and positive songs often correlates to key, as well.

danceability, liveness: Liveness detects the presence of an audience in a track. Having a crowd increases the chances of making someone want to dance

speechiness, acousticness, instrumentalness: Speechiness detects the presence of spoken words, acousticness determines if the track is acoustic or not, and instrumentalness determines if the track has any vocals. These three features all vocals and overall sound of the track, and thus should be crossed.

time signature, energy: time signature measures the beats per second of the track, and this heavily correlates to how much energy the track has.

1.3 Evaluation Criteria

For this data set, we are trying to predict if a song is going to be a hit or a flop. It is in the best interest of the artist to have this prior knowledge or prediction so that they know how to properly allocate resources for marketing their songs. If a song is going to flop they may discard it or spend less resources (money) marketing it whereas if its going to be a hit there has to be more financial resources in hand to be used for marketing the song so that it generates more in revenue sales. In our model we are trying to reduce and minimize the number of False Positives in which we predict a song to be a hit when it is going to flop causing the artist to lose a lot of money marketing a song which wont top the charts. We can afford to have False Negatives because the song or track can find its way to the top of the Bill Board charts and by then we would have noticed its potential and mobilised marketing resources to increase the reach. Our evaluation criteria would be precision since we cannot live with False Positives and it is given by: $ Precision(p) = \frac{True Positives}{True Positives + False Negatives} $

1.4 Splitting Data

#count the frequencies of classes
y.value_counts()

1    3184
0    3075
Name: target, dtype: int64

From the target we note that we have almost an even balance of hit and non hit songs from the count of (1's and 0's). Therefor we are going use Stratified Split to split our data and scikit learn's train_test_split to divide the dataset into 80% training and 20% testing. Stratified Split would ensure that all classes are represented well during the training set and that no class is favoured over the other in our model. The train_test_split allows us to stratify the data by a column so that we split the data evenly.

from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold, StratifiedShuffleSplit


X_train, X_test, y_train, y_test = train_test_split(X,y,stratify=y,test_size = 0.2)
X_train = pd.DataFrame(X_train)
X_train.columns = X.columns
X_test = pd.DataFrame(X_test)
X_test.columns = X.columns

2. Modelling

import tensorflow as tf
from tensorflow.keras.layers import Embedding
from tensorflow.keras.layers import concatenate
from tensorflow import keras
from tensorflow.keras.layers import Dense, Activation, Input
from tensorflow.keras.layers import Embedding, Flatten, Concatenate
from tensorflow.keras.models import Model
from sklearn.preprocessing import LabelEncoder
from functools import reduce



# possible crossing options:
#   'key','time_signature','danceability',
#   'energy','speechiness','acousticness',
#   'instrumentalness','liveness','valence'

cross_columns = [['key','time_signature','valence'],
                 ['danceability', 'energy','instrumentalness'],
                 ['speechiness','acousticness','liveness'],
#                  ['time_signature','energy']
                ]

# save categorical features
categorical_headers = ['key','time_signature']+to_bin

# cross each set of columns in the list above
cross_col_df_names = []
for cols_list in cross_columns:
    # encode as ints for the embedding
    enc = LabelEncoder()
    
    X_crossed_train = []
    X_crossed_test = []
    for row in X_train[cols_list].values:
        X_crossed_train.append(reduce((lambda x,y: x+y),row))
    for row in X_test[cols_list].values:
        X_crossed_test.append(reduce((lambda x,y: x+y),row))
    
    # get a nice name for this new crossed column
    cross_col_name = '_'.join(cols_list)
    
    # 2. encode as integers
#     enc.fit(np.hstack((X_crossed_train.to_numpy(),  X_crossed_test.to_numpy())))
    enc.fit(np.hstack((np.array(X_crossed_train),np.array(X_crossed_test))))
    
    # 3. Save into dataframe with new name
    X_train[cross_col_name] = enc.transform(X_crossed_train)
    X_test[cross_col_name] = enc.transform(X_crossed_test)
    
    # keep track of the new names of the crossed columns
    cross_col_df_names.append(cross_col_name)

# get crossed columns
X_train_crossed = X_train[cross_col_df_names].to_numpy()
X_test_crossed = X_test[cross_col_df_names].to_numpy()

# save categorical features
X_train_cat = X_train[categorical_headers].to_numpy() 
X_test_cat = X_test[categorical_headers].to_numpy() 

# and save off the numeric features
X_train_num = X_train.drop(columns=categorical_headers).to_numpy()
X_test_num = X_test.drop(columns=categorical_headers).to_numpy()


# we need to create separate lists for each branch
crossed_outputs = []

# CROSSED DATA INPUT
input_crossed = Input(shape=(X_train_crossed.shape[1],), dtype='int64', name='wide_inputs')
for idx,col in enumerate(cross_col_df_names):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_crossed, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    crossed_outputs.append(x)
    

# now concatenate the outputs and add a fully connected layer
wide_branch = concatenate(crossed_outputs, name='wide_concat')

# reset this input branch
all_deep_branch_outputs = []

# CATEGORICAL DATA INPUT
input_cat = Input(shape=(X_train_cat.shape[1],), dtype='int64', name='categorical_input')
for idx,col in enumerate(categorical_headers):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_cat, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    all_deep_branch_outputs.append(x)
    
# NUMERIC DATA INPUT
# create dense input branch for numeric
input_num = Input(shape=(X_train_num.shape[1],), name='numeric')
x_dense = Dense(units=15, activation='relu',name='num_1')(input_num)
    
all_deep_branch_outputs.append(x_dense)


# merge the deep branches together
deep_branch = concatenate(all_deep_branch_outputs,name='concat_embeds')
deep_branch = Dense(units=50,activation='relu', name='deep1')(deep_branch)
deep_branch = Dense(units=25,activation='relu', name='deep2')(deep_branch)
deep_branch = Dense(units=10,activation='relu', name='deep3')(deep_branch)
    
# merge the deep and wide branch
final_branch = concatenate([wide_branch, deep_branch],
                           name='concat_deep_wide')
final_branch = Dense(units=1,activation='sigmoid',
                     name='combined')(final_branch)

model = Model(inputs=[input_crossed,input_cat,input_num], 
              outputs=final_branch)

# model.summary()

%%time

model.compile(optimizer='sgd',
              loss='mean_squared_error',
              metrics=['Precision'])

# lets also add the history variable to see how we are doing
# and lets add a validation set to keep track of our progress
history = model.fit([X_train_crossed,X_train_cat,X_train_num],
                    y_train, 
                    epochs=15, 
                    batch_size=32, 
                    verbose=1, 
                    validation_data = ([X_test_crossed,X_test_cat,X_test_num],y_test))

Epoch 1/15
157/157 [==============================] - 1s 6ms/step - loss: 0.2448 - precision: 0.5787 - val_loss: 0.2310 - val_precision: 0.6390
Epoch 2/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2354 - precision: 0.6086 - val_loss: 0.2262 - val_precision: 0.6692
Epoch 3/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2326 - precision: 0.6191 - val_loss: 0.2242 - val_precision: 0.6259
Epoch 4/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2308 - precision: 0.6273 - val_loss: 0.2244 - val_precision: 0.6178
Epoch 5/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2299 - precision: 0.6244 - val_loss: 0.2205 - val_precision: 0.6590
Epoch 6/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2286 - precision: 0.6279 - val_loss: 0.2198 - val_precision: 0.6444
Epoch 7/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2272 - precision: 0.6261 - val_loss: 0.2190 - val_precision: 0.6684
Epoch 8/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2259 - precision: 0.6293 - val_loss: 0.2182 - val_precision: 0.6396
Epoch 9/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2249 - precision: 0.6332 - val_loss: 0.2162 - val_precision: 0.6595
Epoch 10/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2235 - precision: 0.6373 - val_loss: 0.2160 - val_precision: 0.6424
Epoch 11/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2225 - precision: 0.6378 - val_loss: 0.2139 - val_precision: 0.6667
Epoch 12/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2215 - precision: 0.6426 - val_loss: 0.2137 - val_precision: 0.6487
Epoch 13/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2204 - precision: 0.6478 - val_loss: 0.2135 - val_precision: 0.6396
Epoch 14/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2190 - precision: 0.6410 - val_loss: 0.2113 - val_precision: 0.6567
Epoch 15/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2177 - precision: 0.6438 - val_loss: 0.2103 - val_precision: 0.6523
CPU times: user 9.06 s, sys: 974 ms, total: 10 s
Wall time: 8 s

from sklearn import metrics as mt
yhat = np.round(model.predict([X_test_crossed,X_test_cat,X_test_num]))
print(mt.confusion_matrix(y_test,yhat))
print(mt.precision_score(y_test,yhat))


y_pred_0 = model.predict([X_test_crossed,X_test_cat,X_test_num]).ravel()

#false positve and true postive rates using roc
fpr_0, tpr_0, thresholds_0 = mt.roc_curve(y_test, y_pred_0)

#area under the curve
auc_0 = mt.auc(fpr_0, tpr_0)

[[365 250]
 [168 469]]
0.6522948539638387

from matplotlib import pyplot as plt

%matplotlib inline

plt.figure(figsize=(10,4))
plt.subplot(2,2,1)
plt.plot(history.history['precision'])

plt.ylabel('Precision %')
plt.title('Training')
plt.subplot(2,2,2)
plt.plot(history.history['val_precision'])
plt.title('Validation')

plt.subplot(2,2,3)
plt.plot(history.history['loss'])
plt.ylabel('Training Loss')
plt.xlabel('epochs')

plt.subplot(2,2,4)
plt.plot(history.history['val_loss'])
plt.xlabel('epochs')

Text(0.5, 0, 'epochs')

# possible crossing options:
#   'key','time_signature','danceability',
#   'energy','speechiness','acousticness',
#   'instrumentalness','liveness','valence'

cross_columns = [['danceability','energy','valence'],
                 ['key', 'danceability','liveness'],
                 ['speechiness','acousticness','instrumentalness'],
                 ['time_signature','energy']
                ]

# cross each set of columns in the list above
cross_col_df_names = []
for cols_list in cross_columns:
    # encode as ints for the embedding
    enc = LabelEncoder()
    
    X_crossed_train = []
    X_crossed_test = []
    for row in X_train[cols_list].values:
        X_crossed_train.append(reduce((lambda x,y: x+y),row))
    for row in X_test[cols_list].values:
        X_crossed_test.append(reduce((lambda x,y: x+y),row))
    
    # get a nice name for this new crossed column
    cross_col_name = '_'.join(cols_list)
    
    # 2. encode as integers
#     enc.fit(np.hstack((X_crossed_train.to_numpy(),  X_crossed_test.to_numpy())))
    enc.fit(np.hstack((np.array(X_crossed_train),np.array(X_crossed_test))))
    
    # 3. Save into dataframe with new name
    X_train[cross_col_name] = enc.transform(X_crossed_train)
    X_test[cross_col_name] = enc.transform(X_crossed_test)
    
    # keep track of the new names of the crossed columns
    cross_col_df_names.append(cross_col_name)

# get crossed columns
X_train_crossed = X_train[cross_col_df_names].to_numpy()
X_test_crossed = X_test[cross_col_df_names].to_numpy()

# save categorical features
X_train_cat = X_train[categorical_headers].to_numpy() 
X_test_cat = X_test[categorical_headers].to_numpy() 

# and save off the numeric features
X_train_num = X_train.drop(columns=categorical_headers).to_numpy()
X_test_num = X_test.drop(columns=categorical_headers).to_numpy()


# we need to create separate lists for each branch
crossed_outputs = []

# CROSSED DATA INPUT
input_crossed = Input(shape=(X_train_crossed.shape[1],), dtype='int64', name='wide_inputs')
for idx,col in enumerate(cross_col_df_names):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_crossed, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    crossed_outputs.append(x)
    

# now concatenate the outputs and add a fully connected layer
wide_branch = concatenate(crossed_outputs, name='wide_concat')

# reset this input branch
all_deep_branch_outputs = []

# CATEGORICAL DATA INPUT
input_cat = Input(shape=(X_train_cat.shape[1],), dtype='int64', name='categorical_input')
for idx,col in enumerate(categorical_headers):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_cat, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    all_deep_branch_outputs.append(x)
    
# NUMERIC DATA INPUT
# create dense input branch for numeric
input_num = Input(shape=(X_train_num.shape[1],), name='numeric')
x_dense = Dense(units=15, activation='relu',name='num_1')(input_num)
    
all_deep_branch_outputs.append(x_dense)


# merge the deep branches together
deep_branch = concatenate(all_deep_branch_outputs,name='concat_embeds')
deep_branch = Dense(units=50,activation='relu', name='deep1')(deep_branch)
deep_branch = Dense(units=25,activation='relu', name='deep2')(deep_branch)
deep_branch = Dense(units=10,activation='relu', name='deep3')(deep_branch)
    
# merge the deep and wide branch
final_branch = concatenate([wide_branch, deep_branch],
                           name='concat_deep_wide')
final_branch = Dense(units=1,activation='sigmoid',
                     name='combined')(final_branch)

model = Model(inputs=[input_crossed,input_cat,input_num], 
              outputs=final_branch)

# model.summary()

%%time

model.compile(optimizer='sgd',
              loss='mean_squared_error',
              metrics=['Precision'])

# lets also add the history variable to see how we are doing
# and lets add a validation set to keep track of our progress
history = model.fit([X_train_crossed,X_train_cat,X_train_num],
                    y_train, 
                    epochs=15, 
                    batch_size=32, 
                    verbose=1, 
                    validation_data = ([X_test_crossed,X_test_cat,X_test_num],y_test))

Epoch 1/15
157/157 [==============================] - 1s 5ms/step - loss: 0.2462 - precision: 0.5645 - val_loss: 0.2224 - val_precision: 0.6439
Epoch 2/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2145 - precision: 0.6393 - val_loss: 0.2015 - val_precision: 0.6453
Epoch 3/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2028 - precision: 0.6459 - val_loss: 0.1944 - val_precision: 0.7080
Epoch 4/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1950 - precision: 0.6544 - val_loss: 0.1832 - val_precision: 0.6918
Epoch 5/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1911 - precision: 0.6639 - val_loss: 0.2135 - val_precision: 0.7403
Epoch 6/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1871 - precision: 0.6710 - val_loss: 0.1763 - val_precision: 0.6717
Epoch 7/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1824 - precision: 0.6849 - val_loss: 0.1760 - val_precision: 0.7523
Epoch 8/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1806 - precision: 0.6908 - val_loss: 0.1958 - val_precision: 0.7973
Epoch 9/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1785 - precision: 0.6968 - val_loss: 0.1646 - val_precision: 0.7182
Epoch 10/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1781 - precision: 0.6998 - val_loss: 0.1717 - val_precision: 0.7618
Epoch 11/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1744 - precision: 0.7023 - val_loss: 0.1617 - val_precision: 0.7178
Epoch 12/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1759 - precision: 0.6981 - val_loss: 0.1631 - val_precision: 0.7143
Epoch 13/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1727 - precision: 0.7040 - val_loss: 0.1609 - val_precision: 0.7064
Epoch 14/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1728 - precision: 0.7040 - val_loss: 0.1587 - val_precision: 0.7445
Epoch 15/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1696 - precision: 0.7154 - val_loss: 0.1570 - val_precision: 0.7446
CPU times: user 8.55 s, sys: 1.08 s, total: 9.62 s
Wall time: 6.42 s

yhat = np.round(model.predict([X_test_crossed,X_test_cat,X_test_num]))

yhat_best = yhat 
print(mt.confusion_matrix(y_test,yhat))
print(mt.precision_score(y_test,yhat))

y_pred_1 = model.predict([X_test_crossed,X_test_cat,X_test_num]).ravel()

#false positve and true postive rates using roc
fpr_1, tpr_1, thresholds_1 = mt.roc_curve(y_test, y_pred_1)

#area under the curve
auc_1 = mt.auc(fpr_1, tpr_1)

[[425 190]
 [ 83 554]]
0.7446236559139785

from matplotlib import pyplot as plt

%matplotlib inline


plt.figure(figsize=(10,4))
plt.subplot(2,2,1)
plt.plot(history.history['precision'])

plt.ylabel('Precision %')
plt.title('Training')
plt.subplot(2,2,2)
plt.plot(history.history['val_precision'])
plt.title('Validation')

plt.subplot(2,2,3)
plt.plot(history.history['loss'])
plt.ylabel('Training Loss')
plt.xlabel('epochs')

plt.subplot(2,2,4)
plt.plot(history.history['val_loss'])
plt.xlabel('epochs')




model10_hist_accur = history.history['precision']
model10_val_accur = history.history['val_precision']
model10_hist_loss = history.history['loss']
model10_val_loss = history.history['val_loss']

# get crossed columns
X_train_crossed = X_train[cross_col_df_names].to_numpy()
X_test_crossed = X_test[cross_col_df_names].to_numpy()

# save categorical features
X_train_cat = X_train[categorical_headers].to_numpy() 
X_test_cat = X_test[categorical_headers].to_numpy() 

# and save off the numeric features
X_train_num = X_train.drop(columns=categorical_headers).to_numpy()
X_test_num = X_test.drop(columns=categorical_headers).to_numpy()


# we need to create separate lists for each branch
crossed_outputs = []

# CROSSED DATA INPUT
input_crossed = Input(shape=(X_train_crossed.shape[1],), dtype='int64', name='wide_inputs')
for idx,col in enumerate(cross_col_df_names):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_crossed, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    crossed_outputs.append(x)
    

# now concatenate the outputs and add a fully connected layer
wide_branch = concatenate(crossed_outputs, name='wide_concat')

# reset this input branch
all_deep_branch_outputs = []

# CATEGORICAL DATA INPUT
input_cat = Input(shape=(X_train_cat.shape[1],), dtype='int64', name='categorical_input')
for idx,col in enumerate(categorical_headers):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_cat, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    all_deep_branch_outputs.append(x)
    
# NUMERIC DATA INPUT
# create dense input branch for numeric
input_num = Input(shape=(X_train_num.shape[1],), name='numeric')
x_dense = Dense(units=15, activation='relu',name='num_1')(input_num)
    
all_deep_branch_outputs.append(x_dense)


# merge the deep branches together
deep_branch = concatenate(all_deep_branch_outputs,name='concat_embeds')
deep_branch = Dense(units=50,activation='relu', name='deep1')(deep_branch)
deep_branch = Dense(units=25,activation='relu', name='deep2')(deep_branch)
deep_branch = Dense(units=10,activation='relu', name='deep3')(deep_branch)
    
# merge the deep and wide branch
final_branch = concatenate([wide_branch, deep_branch],
                           name='concat_deep_wide')
final_branch = Dense(units=1,activation='sigmoid',
                     name='combined')(final_branch)

model = Model(inputs=[input_crossed,input_cat,input_num], 
              outputs=final_branch)

# model.summary()

%%time

model.compile(optimizer='adagrad',
              loss='mean_squared_error',
              metrics=['Precision'])

# lets also add the history variable to see how we are doing
# and lets add a validation set to keep track of our progress
history = model.fit([X_train_crossed,X_train_cat,X_train_num],
                    y_train, 
                    epochs=15, 
                    batch_size=32, 
                    verbose=1, 
                    validation_data = ([X_test_crossed,X_test_cat,X_test_num],y_test))

Epoch 1/15
157/157 [==============================] - 1s 4ms/step - loss: 0.2628 - precision: 0.5120 - val_loss: 0.2535 - val_precision: 0.5365
Epoch 2/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2466 - precision: 0.5618 - val_loss: 0.2399 - val_precision: 0.5988
Epoch 3/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2369 - precision: 0.5964 - val_loss: 0.2307 - val_precision: 0.6211
Epoch 4/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2298 - precision: 0.6113 - val_loss: 0.2231 - val_precision: 0.6283
Epoch 5/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2238 - precision: 0.6212 - val_loss: 0.2169 - val_precision: 0.6368
Epoch 6/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2189 - precision: 0.6258 - val_loss: 0.2118 - val_precision: 0.6458
Epoch 7/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2151 - precision: 0.6301 - val_loss: 0.2079 - val_precision: 0.6523
Epoch 8/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2121 - precision: 0.6325 - val_loss: 0.2049 - val_precision: 0.6568
Epoch 9/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2097 - precision: 0.6368 - val_loss: 0.2025 - val_precision: 0.6573
Epoch 10/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2076 - precision: 0.6361 - val_loss: 0.2003 - val_precision: 0.6600
Epoch 11/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2058 - precision: 0.6398 - val_loss: 0.1984 - val_precision: 0.6616
Epoch 12/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2042 - precision: 0.6393 - val_loss: 0.1968 - val_precision: 0.6631
Epoch 13/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2028 - precision: 0.6412 - val_loss: 0.1952 - val_precision: 0.6627
Epoch 14/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2015 - precision: 0.6419 - val_loss: 0.1939 - val_precision: 0.6635
Epoch 15/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2003 - precision: 0.6434 - val_loss: 0.1927 - val_precision: 0.6636
CPU times: user 11.9 s, sys: 1.35 s, total: 13.3 s
Wall time: 8.4 s

yhat = np.round(model.predict([X_test_crossed,X_test_cat,X_test_num]))
print(mt.confusion_matrix(y_test,yhat))
print(mt.precision_score(y_test,yhat))

y_pred_2 = model.predict([X_test_crossed,X_test_cat,X_test_num]).ravel()

#false positve and true postive rates using roc
fpr_2, tpr_2, thresholds_2 = mt.roc_curve(y_test, y_pred_2)

#area under the curve
auc_2 = mt.auc(fpr_2, tpr_2)

[[327 288]
 [ 69 568]]
0.6635514018691588

from matplotlib import pyplot as plt

%matplotlib inline

plt.figure(figsize=(10,4))
plt.subplot(2,2,1)
plt.plot(history.history['precision'])

plt.ylabel('Precision %')
plt.title('Training')
plt.subplot(2,2,2)
plt.plot(history.history['val_precision'])
plt.title('Validation')

plt.subplot(2,2,3)
plt.plot(history.history['loss'])
plt.ylabel('Training Loss')
plt.xlabel('epochs')

plt.subplot(2,2,4)
plt.plot(history.history['val_loss'])
plt.xlabel('epochs')

Text(0.5, 0, 'epochs')

# get crossed columns
X_train_crossed = X_train[cross_col_df_names].to_numpy()
X_test_crossed = X_test[cross_col_df_names].to_numpy()

# save categorical features
X_train_cat = X_train[categorical_headers].to_numpy() 
X_test_cat = X_test[categorical_headers].to_numpy() 

# and save off the numeric features
X_train_num = X_train.drop(columns=categorical_headers).to_numpy()
X_test_num = X_test.drop(columns=categorical_headers).to_numpy()


# we need to create separate lists for each branch
crossed_outputs = []

# CROSSED DATA INPUT
input_crossed = Input(shape=(X_train_crossed.shape[1],), dtype='int64', name='wide_inputs')
for idx,col in enumerate(cross_col_df_names):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_crossed, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    crossed_outputs.append(x)
    

# now concatenate the outputs and add a fully connected layer
wide_branch = concatenate(crossed_outputs, name='wide_concat')

# reset this input branch
all_deep_branch_outputs = []

# CATEGORICAL DATA INPUT
input_cat = Input(shape=(X_train_cat.shape[1],), dtype='int64', name='categorical_input')
for idx,col in enumerate(categorical_headers):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_cat, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    all_deep_branch_outputs.append(x)
    
# NUMERIC DATA INPUT
# create dense input branch for numeric
input_num = Input(shape=(X_train_num.shape[1],), name='numeric')
x_dense = Dense(units=15, activation='relu',name='num_1')(input_num)
    
all_deep_branch_outputs.append(x_dense)

# merge the deep branches together
deep_branch = concatenate(all_deep_branch_outputs,name='concat_embeds')
deep_branch = Dense(units=50,activation='relu', name='deep1')(deep_branch)
deep_branch = Dense(units=25,activation='relu', name='deep2')(deep_branch)
deep_branch = Dense(units=10,activation='relu', name='deep3')(deep_branch)
deep_branch = Dense(units=5,activation='relu', name='deep4')(deep_branch)
    
# merge the deep and wide branch
final_branch = concatenate([wide_branch, deep_branch],
                           name='concat_deep_wide')
final_branch = Dense(units=1,activation='sigmoid',
                     name='combined')(final_branch)

model = Model(inputs=[input_crossed,input_cat,input_num], 
              outputs=final_branch)

# model.summary()

%%time

model.compile(optimizer='sgd',
              loss='mean_squared_error',
              metrics=['Precision'])

# lets also add the history variable to see how we are doing
# and lets add a validation set to keep track of our progress
history = model.fit([X_train_crossed,X_train_cat,X_train_num],
                    y_train, 
                    epochs=15, 
                    batch_size=32, 
                    verbose=1, 
                    validation_data = ([X_test_crossed,X_test_cat,X_test_num],y_test))

Epoch 1/15
157/157 [==============================] - 1s 4ms/step - loss: 0.2288 - precision: 0.6251 - val_loss: 0.2157 - val_precision: 0.6565
Epoch 2/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2153 - precision: 0.6360 - val_loss: 0.2037 - val_precision: 0.6588
Epoch 3/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2065 - precision: 0.6371 - val_loss: 0.1959 - val_precision: 0.6583
Epoch 4/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2009 - precision: 0.6409 - val_loss: 0.1913 - val_precision: 0.6734
Epoch 5/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1969 - precision: 0.6428 - val_loss: 0.1891 - val_precision: 0.6487
Epoch 6/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1936 - precision: 0.6456 - val_loss: 0.1846 - val_precision: 0.6616
Epoch 7/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1912 - precision: 0.6501 - val_loss: 0.1858 - val_precision: 0.6469
Epoch 8/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1890 - precision: 0.6550 - val_loss: 0.1848 - val_precision: 0.6495
Epoch 9/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1863 - precision: 0.6636 - val_loss: 0.1785 - val_precision: 0.6644
Epoch 10/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1849 - precision: 0.6643 - val_loss: 0.1763 - val_precision: 0.6895
Epoch 11/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1833 - precision: 0.6696 - val_loss: 0.1780 - val_precision: 0.7151
Epoch 12/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1814 - precision: 0.6785 - val_loss: 0.1730 - val_precision: 0.7022
Epoch 13/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1798 - precision: 0.6810 - val_loss: 0.1749 - val_precision: 0.7227
Epoch 14/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1783 - precision: 0.6837 - val_loss: 0.1698 - val_precision: 0.6923
Epoch 15/15
157/157 [==============================] - 1s 3ms/step - loss: 0.1770 - precision: 0.6838 - val_loss: 0.1767 - val_precision: 0.6718
CPU times: user 9.13 s, sys: 1.02 s, total: 10.1 s
Wall time: 7.24 s

yhat = np.round(model.predict([X_test_crossed,X_test_cat,X_test_num]))
print(mt.confusion_matrix(y_test,yhat))
print(mt.precision_score(y_test,yhat))

[[316 299]
 [ 25 612]]
0.6717892425905598

y_pred_3 = model.predict([X_test_crossed,X_test_cat,X_test_num]).ravel()

#false positve and true postive rates using roc
fpr_3, tpr_3, thresholds_3 = mt.roc_curve(y_test, y_pred_3)

#area under the curve
auc_3 = mt.auc(fpr_3, tpr_3)

from matplotlib import pyplot as plt

%matplotlib inline

plt.figure(figsize=(10,4))
plt.subplot(2,2,1)
plt.plot(history.history['precision'])

plt.ylabel('Precision %')
plt.title('Training')
plt.subplot(2,2,2)
plt.plot(history.history['val_precision'])
plt.title('Validation')

plt.subplot(2,2,3)
plt.plot(history.history['loss'])
plt.ylabel('Training Loss')
plt.xlabel('epochs')

plt.subplot(2,2,4)
plt.plot(history.history['val_loss'])
plt.xlabel('epochs')

Text(0.5, 0, 'epochs')

# get crossed columns
X_train_crossed = X_train[cross_col_df_names].to_numpy()
X_test_crossed = X_test[cross_col_df_names].to_numpy()

# save categorical features
X_train_cat = X_train[categorical_headers].to_numpy() 
X_test_cat = X_test[categorical_headers].to_numpy() 

# and save off the numeric features
X_train_num = X_train.drop(columns=categorical_headers).to_numpy()
X_test_num = X_test.drop(columns=categorical_headers).to_numpy()


# we need to create separate lists for each branch
crossed_outputs = []

# CROSSED DATA INPUT
input_crossed = Input(shape=(X_train_crossed.shape[1],), dtype='int64', name='wide_inputs')
for idx,col in enumerate(cross_col_df_names):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_crossed, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    crossed_outputs.append(x)
    

# now concatenate the outputs and add a fully connected layer
wide_branch = concatenate(crossed_outputs, name='wide_concat')

# reset this input branch
all_deep_branch_outputs = []

# CATEGORICAL DATA INPUT
input_cat = Input(shape=(X_train_cat.shape[1],), dtype='int64', name='categorical_input')
for idx,col in enumerate(categorical_headers):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_cat, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    all_deep_branch_outputs.append(x)
    
# NUMERIC DATA INPUT
# create dense input branch for numeric
input_num = Input(shape=(X_train_num.shape[1],), name='numeric')
x_dense = Dense(units=15, activation='relu',name='num_1')(input_num)
    
all_deep_branch_outputs.append(x_dense)

# merge the deep branches together
deep_branch = concatenate(all_deep_branch_outputs,name='concat_embeds')
deep_branch = Dense(units=75,activation='relu', name='deep0')(deep_branch)
deep_branch = Dense(units=50,activation='relu', name='deep1')(deep_branch)
deep_branch = Dense(units=25,activation='relu', name='deep2')(deep_branch)
deep_branch = Dense(units=10,activation='relu', name='deep3')(deep_branch)
deep_branch = Dense(units=5,activation='relu', name='deep4')(deep_branch)
    
# merge the deep and wide branch
final_branch = concatenate([wide_branch, deep_branch],
                           name='concat_deep_wide')
final_branch = Dense(units=1,activation='sigmoid',
                     name='combined')(final_branch)

model = Model(inputs=[input_crossed,input_cat,input_num], 
              outputs=final_branch)

# model.summary()

%%time

model.compile(optimizer='sgd',
              loss='mean_squared_error',
              metrics=['Precision'])

# lets also add the history variable to see how we are doing
# and lets add a validation set to keep track of our progress
history = model.fit([X_train_crossed,X_train_cat,X_train_num],
                    y_train, 
                    epochs=15, 
                    batch_size=32, 
                    verbose=1, 
                    validation_data = ([X_test_crossed,X_test_cat,X_test_num],y_test))

Epoch 1/15
157/157 [==============================] - 1s 5ms/step - loss: 0.2247 - precision: 0.6354 - val_loss: 0.1994 - val_precision: 0.6587
Epoch 2/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1984 - precision: 0.6373 - val_loss: 0.1884 - val_precision: 0.6624
Epoch 3/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1920 - precision: 0.6435 - val_loss: 0.1862 - val_precision: 0.6801
Epoch 4/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1888 - precision: 0.6486 - val_loss: 0.1815 - val_precision: 0.6694
Epoch 5/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1862 - precision: 0.6517 - val_loss: 0.1786 - val_precision: 0.6744
Epoch 6/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1846 - precision: 0.6521 - val_loss: 0.1769 - val_precision: 0.6721
Epoch 7/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1832 - precision: 0.6553 - val_loss: 0.1768 - val_precision: 0.6591
Epoch 8/15
157/157 [==============================] - 1s 3ms/step - loss: 0.1815 - precision: 0.6585 - val_loss: 0.1766 - val_precision: 0.6602
Epoch 9/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1805 - precision: 0.6629 - val_loss: 0.1724 - val_precision: 0.6727
Epoch 10/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1790 - precision: 0.6660 - val_loss: 0.1706 - val_precision: 0.6837
Epoch 11/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1770 - precision: 0.6729 - val_loss: 0.1714 - val_precision: 0.6734
Epoch 12/15
157/157 [==============================] - 1s 3ms/step - loss: 0.1757 - precision: 0.6760 - val_loss: 0.1678 - val_precision: 0.6767
Epoch 13/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1744 - precision: 0.6767 - val_loss: 0.1665 - val_precision: 0.6846
Epoch 14/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1734 - precision: 0.6815 - val_loss: 0.1905 - val_precision: 0.7485
Epoch 15/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1724 - precision: 0.6890 - val_loss: 0.1658 - val_precision: 0.6824
CPU times: user 9.61 s, sys: 1 s, total: 10.6 s
Wall time: 7.95 s

yhat = np.round(model.predict([X_test_crossed,X_test_cat,X_test_num]))
print(mt.confusion_matrix(y_test,yhat))
print(mt.precision_score(y_test,yhat))

[[339 276]
 [ 44 593]]
0.6823935558112774

from matplotlib import pyplot as plt

%matplotlib inline

plt.figure(figsize=(10,4))
plt.subplot(2,2,1)
plt.plot(history.history['precision'])

plt.ylabel('Precision %')
plt.title('Training')
plt.subplot(2,2,2)
plt.plot(history.history['val_precision'])
plt.title('Validation')

plt.subplot(2,2,3)
plt.plot(history.history['loss'])
plt.ylabel('Training Loss')
plt.xlabel('epochs')

plt.subplot(2,2,4)
plt.plot(history.history['val_loss'])
plt.xlabel('epochs')

Text(0.5, 0, 'epochs')

Comparing wide and deep models

from sklearn import metrics
y_pred_4 = model.predict([X_test_crossed,X_test_cat,X_test_num]).ravel()

#false positve and true postive rates using roc
fpr_4, tpr_4, thresholds_4 = metrics.roc_curve(y_test, y_pred_4)

#area under the curve
auc_4 = metrics.auc(fpr_4, tpr_4)

plt.figure(figsize=(12,12))

#plot halfway line
plt.plot([0,1], [0,1], 'k--')

#plot for model 0 ROC
plt.plot(fpr_0, tpr_0, label='Model 0 (area = {:.3f})'.format(auc_0))

#plot for model 1 ROC
plt.plot(fpr_1, tpr_1, label='Model 1 (area = {:.3f})'.format(auc_1))

#plot for model 2 ROC
plt.plot(fpr_2, tpr_2, label='Model 2 (area = {:.3f})'.format(auc_2))

#plot for model 3 ROC
plt.plot(fpr_3, tpr_3, label='Model 3 (area = {:.3f})'.format(auc_3))

#plot for model 4 ROC
plt.plot(fpr_4, tpr_4, label='Model 4 (area = {:.3f})'.format(auc_4))

plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('All Wide and Deep ROC curves')
plt.legend(loc='best')
plt.show()

From the above ROC we note that model 1 perfomed better than the other models, so we are going to compare it with the standard MultiLayer Perceptron from scikit learn's library.

Comparing our best Wide and Deep Model to the standard Multi Layer Perceptron

from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, precision_score
from sklearn import metrics

data_features = ['key','time_signature','valence',
                 'danceability', 'energy','instrumentalness',
                 'speechiness','acousticness','liveness',
                  'time_signature','energy'
                ]

mlp = MLPClassifier(hidden_layer_sizes=(50,),
                    learning_rate_init=0.01,
                    random_state=1,
                    activation='relu')

mlp.fit(X_train[data_features], y_train)
yhat_mlp = mlp.predict(X_test[data_features])

print("MLP Accuracy Score: ", accuracy_score(y_test, yhat_mlp))
print("MLP Precision Score: ",precision_score(y_test,yhat_mlp))

#false positve and true postive rates using roc
fpr_sk, tpr_sk, thresholds_sk = metrics.roc_curve(y_test, yhat_mlp)

#area under the curve
auc_sk = metrics.auc(fpr_sk, tpr_sk)

MLP Accuracy Score:  0.8099041533546326
MLP Precision Score:  0.7692307692307693

We note that the MLP has a higher precision score of 0.7692307692307693 compared to our best performing Wide and Deep Network which has an precision score of 0.6823935558112774

plt.figure(figsize=(10,12))

#plot halfway line
plt.plot([0,1], [0,1], 'k--')

#plot for Wide and Deep ROC
plt.plot(fpr_4, tpr_4, label='Wide and Deep (area = {:.3f})'.format(auc_4))

#plot for MLP ROC
plt.plot(fpr_sk, tpr_sk, label='MLP (area = {:.3f})'.format(auc_sk))

plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Wide and Deep vs MLP ROC curve')
plt.legend(loc='best')
plt.show()

We can conclude that our Wide and Deep Neural Network performed slightly better than the Multi Layer Perceptron (mlp) from scikit's learn standard library. The ROC curve of the Wide and Deep Network is more close to the top left with an AUC of 0.815 compared to 0.808 of the standard mlp. Now we are going to carry out a Mcnemar test to compare the two models

from statsmodels.stats.contingency_tables import mcnemar

# define contingency table
# calculate mcnemar test
result = mcnemar(mt.confusion_matrix(y_test,yhat_mlp), exact=False, correction=True)
result2 = mcnemar(mt.confusion_matrix(y_test,yhat_best), exact=False, correction=True)

# summarize the finding
print('statistic=%.3f, p-value=%.25f' % (result.statistic, result.pvalue))
print('statistic=%.3f, p-value=%.25f' % (result2.statistic, result2.pvalue))

statistic=44.576, p-value=0.0000000000244718815686789
statistic=41.158, p-value=0.0000000001404426639141641

Since the p-value is less than 0.05, we accept the alternative hypothesis that there is no significant difference between these models. However since the wide and deep network has a significantly higher p-value, we can conclude that it perfomes better compared to the MLP.

3. Exceptional Work

Here we examine the effects of dropout on the ROC curve compared to our best perfoming wide and deep model. We also look to see if there are any differences between the training and validation loss and accuracy graphs

from keras.layers import Dropout


# get crossed columns
X_train_crossed = X_train[cross_col_df_names].to_numpy()
X_test_crossed = X_test[cross_col_df_names].to_numpy()

# save categorical features
X_train_cat = X_train[categorical_headers].to_numpy() 
X_test_cat = X_test[categorical_headers].to_numpy() 

# and save off the numeric features
X_train_num = X_train.drop(columns=categorical_headers).to_numpy()
X_test_num = X_test.drop(columns=categorical_headers).to_numpy()


# we need to create separate lists for each branch
crossed_outputs = []

# CROSSED DATA INPUT
input_crossed = Input(shape=(X_train_crossed.shape[1],), dtype='int64', name='wide_inputs')
for idx,col in enumerate(cross_col_df_names):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_crossed, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    crossed_outputs.append(x)
    

# merging the branches together 
wide_branch = concatenate(crossed_outputs, name='wide_concat')
wide_branch = Dense(units=1,activation='relu',name='num_0')(wide_branch)
wide_branch = Dropout(0.1)(wide_branch)

# reset this input branch
all_deep_branch_outputs = []

# CATEGORICAL DATA INPUT
input_cat = Input(shape=(X_train_cat.shape[1],), dtype='int64', name='categorical_input')
for idx,col in enumerate(categorical_headers):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_cat, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    all_deep_branch_outputs.append(x)
    
# NUMERIC DATA INPUT
# create dense input branch for numeric
input_num = Input(shape=(X_train_num.shape[1],), name='numeric')
x_dense = Dense(units=15, activation='relu',name='num_1')(input_num)
x_dense = Dropout(0.1)(x_dense)
    
all_deep_branch_outputs.append(x_dense)

# merge the deep branches together
deep_branch = concatenate(all_deep_branch_outputs,name='concat_embeds')
deep_branch = Dense(units=75,activation='relu', name='deep0')(deep_branch)
deep_branch = Dropout(0.3)(deep_branch)
print('Deep 0 created')
deep_branch = Dense(units=50,activation='relu', name='deep1')(deep_branch)
deep_branch = Dropout(0.3)(deep_branch)
print('Deep 1 created')
deep_branch = Dense(units=25,activation='relu', name='deep2')(deep_branch)
deep_branch = Dropout(0.3)(deep_branch)
print('Deep 2 created')
deep_branch = Dense(units=10,activation='relu', name='deep3')(deep_branch)
deep_branch = Dropout(0.3)(deep_branch)
print('Deep 3 created')
deep_branch = Dense(units=5,activation='relu', name='deep4')(deep_branch)
deep_branch = Dropout(0.3)(deep_branch)
print('Deep 4 created')
    
# merge the deep and wide branch
final_branch = concatenate([wide_branch, deep_branch],
                           name='concat_deep_wide')
final_branch = Dense(units=1,activation='sigmoid',
                     name='combined')(final_branch)
deep_branch = Dropout(0.1)(deep_branch)

model = Model(inputs=[input_crossed,input_cat,input_num], 
              outputs=final_branch)

# model.summary()

Deep 0 created
Deep 1 created
Deep 2 created
Deep 3 created
Deep 4 created

%%time

model.compile(optimizer='sgd',
              loss='mean_squared_error',
              metrics=['Precision'])

# lets also add the history variable to see how we are doing
# and lets add a validation set to keep track of our progress
history = model.fit([X_train_crossed,X_train_cat,X_train_num],
                    y_train, 
                    epochs=15, 
                    batch_size=32, 
                    verbose=1, 
                    validation_data = ([X_test_crossed,X_test_cat,X_test_num],y_test))

Epoch 1/15
157/157 [==============================] - 1s 4ms/step - loss: 0.2697 - precision: 0.5155 - val_loss: 0.2495 - val_precision: 0.5266
Epoch 2/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2543 - precision: 0.5207 - val_loss: 0.2495 - val_precision: 0.5419
Epoch 3/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2532 - precision: 0.5221 - val_loss: 0.2492 - val_precision: 0.5374
Epoch 4/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2506 - precision: 0.5312 - val_loss: 0.2480 - val_precision: 0.5561
Epoch 5/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2517 - precision: 0.5237 - val_loss: 0.2475 - val_precision: 0.5560
Epoch 6/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2492 - precision: 0.5309 - val_loss: 0.2466 - val_precision: 0.5585
Epoch 7/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2484 - precision: 0.5305 - val_loss: 0.2445 - val_precision: 0.5676
Epoch 8/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2488 - precision: 0.5332 - val_loss: 0.2433 - val_precision: 0.5789
Epoch 9/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2479 - precision: 0.5347 - val_loss: 0.2406 - val_precision: 0.5915
Epoch 10/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2473 - precision: 0.5378 - val_loss: 0.2390 - val_precision: 0.5976
Epoch 11/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2462 - precision: 0.5425 - val_loss: 0.2366 - val_precision: 0.6064
Epoch 12/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2432 - precision: 0.5548 - val_loss: 0.2310 - val_precision: 0.6228
Epoch 13/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2413 - precision: 0.5569 - val_loss: 0.2278 - val_precision: 0.6266
Epoch 14/15
157/157 [==============================] - 1s 3ms/step - loss: 0.2427 - precision: 0.5533 - val_loss: 0.2248 - val_precision: 0.6318
Epoch 15/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2395 - precision: 0.5550 - val_loss: 0.2216 - val_precision: 0.6285
CPU times: user 10.1 s, sys: 1.03 s, total: 11.1 s
Wall time: 7.63 s

yhat = np.round(model.predict([X_test_crossed,X_test_cat,X_test_num]))
yhat_drop = yhat
print(mt.confusion_matrix(y_test,yhat))
print(mt.precision_score(y_test,yhat))

[[271 344]
 [ 55 582]]
0.6285097192224622

y_pred_dropout = model.predict([X_test_crossed,X_test_cat,X_test_num]).ravel()

#false positve and true postive rates using roc
fpr_dropout, tpr_dropout, thresholds_2 = mt.roc_curve(y_test, y_pred_dropout)

#area under the curve
auc_dropout = mt.auc(fpr_dropout, tpr_dropout)

plt.figure(figsize=(10,12))

#plot halfway line
plt.plot([0,1], [0,1], 'k--')

#plot for Wide and Deep ROC
plt.plot(fpr_4, tpr_4, label='Wide and Deep (area = {:.3f})'.format(auc_4))

#plot for MLP ROC
plt.plot(fpr_dropout, tpr_dropout, label='Wide and Deep with Dropout (area = {:.3f})'.format(auc_dropout))

plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Wide and Deep vs Wide and Deep with Dropout ROC curve')
plt.legend(loc='best')
plt.show()

Here we note that our wide and deep model had a large AUC without the dropout, showing that our model was not overfitting.

from matplotlib import pyplot as plt

%matplotlib inline

plt.figure(figsize=(10,4))
plt.subplot(2,2,1)
plt.plot(history.history['precision'])

plt.ylabel('Precision %')
plt.title('Training with DropOut')
plt.subplot(2,2,2)
plt.plot(history.history['val_precision'])
plt.title('Validation with DropOut')

plt.subplot(2,2,3)
plt.plot(history.history['loss'])
plt.ylabel('Training')
plt.xlabel('epochs')

plt.subplot(2,2,4)
plt.plot(history.history['val_loss'])
plt.xlabel('epochs')


plt.figure(figsize=(10,4))
plt.subplot(2,2,1)
plt.plot(model10_hist_accur)

plt.ylabel('Precision %')
plt.title('Training without DropOut')
plt.subplot(2,2,2)
plt.plot(model10_val_accur)
plt.title('Validation without DropOut')

plt.subplot(2,2,3)
plt.plot(model10_hist_loss)
plt.ylabel('Training Loss')
plt.xlabel('epochs')

plt.subplot(2,2,4)
plt.plot(model10_val_loss)
plt.xlabel('epochs')

Text(0.5, 0, 'epochs')

The validation accuracy without dropout is slightly higher compared to that with dropout but we also note that validation lines are pretty consistent for the accuracy and loss functions when have dropout compared to when we do not have it. This might be because our data set is small and if there are more data samples (bigger dataset) there is a possibility that using dropout might be beneficial in the overall generalization process thus reducing overfitting for the model. What we could do differently if there were no hardware constraints would be to increase the number of epochs to 30 and observe if there would be any changes.

	track	artist	uri	danceability	energy	key	loudness	mode	speechiness	acousticness	instrumentalness	liveness	valence	tempo	duration_ms	time_signature	chorus_hit	sections	target
0	Wild Things	Alessia Cara	spotify:track:2ZyuwVvV6Z3XJaXIFbspeE	0.741	0.626	1	-4.826	0	0.0886	0.02000	0.000	0.0828	0.706	108.029	188493	4	41.18681	10	1
1	Surfboard	Esquivel!	spotify:track:61APOtq25SCMuK0V5w2Kgp	0.447	0.247	5	-14.661	0	0.0346	0.87100	0.814	0.0946	0.250	155.489	176880	3	33.18083	9	0
2	Love Someone	Lukas Graham	spotify:track:2JqnpexlO9dmvjUMCaLCLJ	0.550	0.415	9	-6.557	0	0.0520	0.16100	0.000	0.1080	0.274	172.065	205463	4	44.89147	9	1
3	Music To My Ears (feat. Tory Lanez)	Keys N Krates	spotify:track:0cjfLhk8WJ3etPTCseKXtk	0.502	0.648	0	-5.698	0	0.0527	0.00513	0.000	0.2040	0.291	91.837	193043	4	29.52521	7	0
4	Juju On That Beat (TZ Anthem)	Zay Hilfigerrr & Zayion McCall	spotify:track:1lItf5ZXJc1by9SbPeljFd	0.807	0.887	1	-3.892	1	0.2750	0.00381	0.000	0.3910	0.780	160.517	144244	4	24.99199	8	1

	danceability	energy	key	loudness	mode	speechiness	acousticness	instrumentalness	liveness	valence	tempo	duration_ms	time_signature	chorus_hit	sections	target
0	0.741	0.626	1	-4.826	0	0.0886	0.02000	0.000	0.0828	0.706	108.029	188493	4	41.18681	10	1
1	0.447	0.247	5	-14.661	0	0.0346	0.87100	0.814	0.0946	0.250	155.489	176880	3	33.18083	9	0
2	0.550	0.415	9	-6.557	0	0.0520	0.16100	0.000	0.1080	0.274	172.065	205463	4	44.89147	9	1
3	0.502	0.648	0	-5.698	0	0.0527	0.00513	0.000	0.2040	0.291	91.837	193043	4	29.52521	7	0
4	0.807	0.887	1	-3.892	1	0.2750	0.00381	0.000	0.3910	0.780	160.517	144244	4	24.99199	8	1

	danceability	energy	key	loudness	mode	speechiness	acousticness	instrumentalness	liveness	valence	tempo	duration_ms	time_signature	chorus_hit	sections	target
count	6398.000000	6398.000000	6398.000000	6398.000000	6398.000000	6398.000000	6398.000000	6398.000000	6398.000000	6398.000000	6398.000000	6.398000e+03	6398.000000	6398.000000	6398.000000	6398.000000
mean	0.568163	0.667756	5.283526	-7.589796	0.645514	0.098018	0.216928	0.165293	0.196700	0.443734	122.353871	2.367042e+05	3.930916	41.028399	10.316505	0.500000
std	0.191103	0.240721	3.606216	5.234592	0.478395	0.097224	0.296835	0.318736	0.166148	0.245776	29.847389	8.563698e+04	0.377469	19.568827	3.776011	0.500039
min	0.062200	0.000251	0.000000	-46.655000	0.000000	0.022500	0.000000	0.000000	0.016700	0.000000	39.369000	2.985300e+04	0.000000	0.000000	2.000000	0.000000
25%	0.447000	0.533000	2.000000	-8.425000	0.000000	0.038825	0.008533	0.000000	0.096800	0.240000	98.091250	1.932068e+05	4.000000	28.059135	8.000000	0.000000
50%	0.588000	0.712500	5.000000	-6.096500	1.000000	0.057200	0.067050	0.000017	0.126000	0.434000	121.070000	2.212465e+05	4.000000	36.265365	10.000000	0.500000
75%	0.710000	0.857000	8.000000	-4.601250	1.000000	0.112000	0.311000	0.057650	0.249000	0.628000	141.085000	2.593165e+05	4.000000	48.292538	12.000000	1.000000
max	0.981000	0.999000	11.000000	-0.149000	1.000000	0.956000	0.996000	0.995000	0.982000	0.976000	210.977000	1.734201e+06	5.000000	213.154990	88.000000	1.000000

	danceability	energy	key	loudness	mode	speechiness	acousticness	instrumentalness	liveness	valence	tempo	duration_ms	time_signature	chorus_hit	sections	target
count	6259.000000	6259.000000	6259.000000	6259.000000	6259.000000	6259.000000	6259.000000	6259.000000	6259.000000	6259.000000	6259.000000	6.259000e+03	6259.000000	6259.000000	6259.000000	6259.000000
mean	6.190925	7.165362	5.275923	-7.573511	0.646110	1.501198	2.829525	2.479789	2.491293	4.950631	122.418008	2.358256e+05	3.930979	40.979194	10.290302	0.508707
std	1.926205	2.382752	3.607157	5.250005	0.478214	0.979719	2.838553	2.994407	1.688903	2.466385	29.921408	8.557350e+04	0.377157	19.558411	3.778070	0.499964
min	1.000000	1.000000	0.000000	-46.655000	0.000000	1.000000	1.000000	1.000000	1.000000	1.000000	39.369000	2.985300e+04	0.000000	0.000000	2.000000	0.000000
25%	5.000000	6.000000	2.000000	-8.381500	0.000000	1.000000	1.000000	1.000000	1.000000	3.000000	98.058000	1.928200e+05	4.000000	28.066490	8.000000	0.000000
50%	6.000000	8.000000	5.000000	-6.078000	1.000000	1.000000	1.000000	1.000000	2.000000	5.000000	121.189000	2.207810e+05	4.000000	36.246140	10.000000	1.000000
75%	8.000000	9.000000	8.000000	-4.603500	1.000000	2.000000	4.000000	1.000000	3.000000	7.000000	141.250500	2.579865e+05	4.000000	48.186325	12.000000	1.000000
max	10.000000	10.000000	11.000000	-0.149000	1.000000	10.000000	10.000000	10.000000	10.000000	10.000000	210.977000	1.734201e+06	5.000000	213.154990	88.000000	1.000000

	danceability	energy	key	loudness	mode	speechiness	acousticness	instrumentalness	liveness	valence	tempo	duration_ms	time_signature	chorus_hit	sections
0	8	7	1	0.399432	0	1	1	1	1	8	-0.099902	-0.406920	4	-0.306775	-0.406977
1	5	3	5	0.187954	0	1	9	9	1	3	0.176658	-0.413734	3	-0.344335	-0.418605
2	6	5	9	0.362211	0	1	2	1	2	3	0.273251	-0.396964	4	-0.289395	-0.418605
3	6	7	0	0.380682	0	1	1	1	3	3	-0.194257	-0.404251	4	-0.361485	-0.441860
4	9	9	1	0.419516	1	3	1	1	4	8	0.205958	-0.432883	4	-0.382752	-0.430233
5	5	9	0	0.435578	1	1	1	1	5	8	0.232571	-0.391767	4	-0.349063	-0.383721
6	6	10	0	0.423558	1	2	1	1	2	5	0.086937	-0.363502	4	-0.401269	-0.360465
7	8	6	2	0.330753	1	2	1	1	2	4	-0.160983	-0.399942	4	-0.217528	-0.406977
8	2	10	7	0.441147	1	2	1	1	10	2	0.288751	-0.369197	4	-0.353460	-0.395349
9	4	8	8	0.380962	1	2	1	1	3	4	-0.271223	-0.368415	4	-0.390678	-0.418605

	Features	Descriptions	Scales	Discrete/Continuous	Range
0	danceability	How suitable a track is for dancing	ratio	Continuous	0.062200-0.981000
1	energy	A perceptual measure of intensity and activity	ratio	Continuous	0.000251-0.999000
2	key	The estimated overall key of the track	ordinal	Discrete	0:C, 1:C#, 2:D, 3:Eb, 4:E, 5:F etc
3	loudness	The overall loudness of a track in decibels	ratio	Continuous	-46.655000--0.149000
4	mode	The modality (major or minor) of a track	nominal	Discrete	0 (Minor) and 1 (Major)
5	speechiness	The presence of spoken words in a track	ratio	Continuous	0.022500-0.956000
6	acousticness	Whether the track is acoustic	ratio	Continuous	0-0.996000
7	instrumentalness	Predicts whether a track contains no vocals	ratio	Continuous	0-0.995000
8	liveness	The presence of an audience in the recording	ratio	Continuous	0.016700-0.982000
9	valence	Musical positiveness conveyed by a track	ratio	Continuous	0-0.976000
10	tempo	Beats per minute	ratio	Continuous	39.369000-210.977000
11	duration_ms	The duration of the track in milliseconds	ratio	Discrete	29853-1734201
12	time_signature	An estimated overall time signature of a track	ratio	Discrete	0-5
13	chorus_hit	Timestamp the third section of the track	ratio	Continuous	0-213.154990
14	sections	The number of sections the particular track has	ratio	Discrete	2-88
15	target	The target variable for the track	nominal	Discrete	0:flop, 1:hit