Hello and Welcome to the first post of ‘Debunking Neural Networks’ series. In this series, we will try to understand the underlying mechanisms and concepts of the black box that Deep Learning is. Not only will we try to understand how Deep Learning works but also implement it with our very own computer!
Deep Learning has taken the world by storm these days. From powering Google’s search engine to estimating time for your Uber deliveries, Deep Learning has proven its potential to change the way technology works.
What is Deep Learning???
Deep learning is a subfield of machine learning which tries to mimic human intelligence with the help of data. LEARNING BY EXAMPLE.
So what drives Deep Learning???
At the heart of Deep Learning is a novel Mathematical Architecture known as Neural Network. As the name suggests, these Neural Networks are inspired by the Biological Neurons. A Neuron is an electrically excitable cell that receives, processes and transmits information through electrical signals. An artificial neuron is a mathematical function representing a model of a biological neuron. A very simple ANN is shown below. It contains one input layer of 3 input nodes, 2 hidden layers with 4 nodes each and an output node. You can find a slightly more complex Neural Network structure here.
some image here
Dwelling into the mathematical details of an ANN is beyond the scope of this post. That is precisely why I would like to direct you all to this article. It describes a brief history of ANN’s, all the necessary Mathematics involved in them, important terminologies, etc. It is a must-read article for everyone who wants to truly understand what happens behind the scene of such an exciting Technology.
Enough with the introduction, hype, and theories. Time to get our hands dirty! We will start by creating a virtual environment and installing all the necessary modules.
Note: This post assumes that you have familiarity with Python Language.
To start working with DL we need to create an environment in which we can install all the necessary libraries. Working with virtual environments is the best way to ensure that none of your installation mess up the entire system! Follow the steps below to replicate a Virtual Environment that I have used for these projects.
python3 -m venv keras_venv
source /path/to/your/venv/bin/activate
pip install numpy scipy scikit-learn pandas matplotlib seaborn pillow keras
We are going to use Tensorflow as the backend to Keras Library. Tensorflow comes in 2 variants i.e. CPU only and CPU+GPU. The former variant is easy to install with just adding ‘TensorFlow’ at the end of the above pip install command. However, the latter variant is not that easy to install. You will need to have CUDA enabled GPU/’s for it to work. You will have to install necessary CUDA libraries and see if the TF version is compatible with those or not. You can find some really nice tutorials on how to install TF GPU by just looking it up on the internet. I’ll leave that part up to you.
A primer on TensorFlow & Keras:
Tensorflow is an opensource Machine Learning framework by Google. It is currently the most preferred ML/DL framework to work with by not only beginners but also Industrial veterans. Many of the established companies in different industries, today, use TF to power their AI needs. Google itself utilizes TF across a majority of its product lines. While TF is an amazing framework to work with, it can be a little tricky to wrap one’s mind around. It's almost writing another Language inside/with Python. Keras is a high-level library built on top of Tensorflow. It abstracts off almost all the unnecessary complexities that accompany along with TF. It is really easy to understand and implement, has amazing documentations and is updated regularly. Owing to all these attributes Keras is the go-to library for quick prototyping of a Neural Network project.
That’s it. Your DL environment is ready to rumble!
This post is divided into 3 sections each with its own problem statement.
Regression is about predicting a real OR continuous quantity such as ‘salary’ or ‘price’ or ‘weight’. Predicting the price of a stock at a given time based on the previous pattern, predicting the price of a real estate property into the future, predicting the salary of an employee based on data from a previous company, etc can be categorized into Regression problems. The Regression problem comes under the Supervised Learning category. In ML there are Linear models to solve Linear problems and Non-Linear models to solve Non-Linear problems. I would highly suggest you go through these models to get a better understanding of the Math behind them. It also helps in building up the intuition. Here we will try to solve a Simple Linear Regression problem with ANN.
The code in itself is quite well commented from my side but we will still try to simplify it by breaking it down piece by piece
Breaking down the code:
Lines 1–8 deals with importing all the necessary modules and libraries. Numpy is used for numerical computations, Pandas for data manipulation, Sklearn for data creation and preprocessing and finally Keras for creating the ANN model.
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, Dropout
from sklearn.preprocessing import MinMaxScaler
from sklearn.datasets import make_regression
Line 10–21 deals with creating a random blob of data suitable for making a regression prediction and scaling it. Scaling the input data is a necessary step in some datasets because of its high variability. Scaling the dataset makes the model less prone to outliers.
X, Y = make_regression(n_samples=100, n_features=4, noise=0.1, random_state=1)
scaled_X = MinMaxScaler()
scaled_Y = MinMaxScaler()
scaled_X.fit(X)
scaled_Y.fit(Y.reshape(100, 1))
X = scaled_X.transform(X)
Y = scaled_Y.transform(Y.reshape(100, 1))
On line 23–24 we initialize the instance of a model
model = Sequential()
From lines 26–42 we actually create our ANN model with Dense networks. Notice after every layer of a dense neural network we add another line with the Dropout method. The Dropout method in Keras randomly drops nodes from the layer. This is done in order to avoid overfitting in our model.Overfitting means that the model does not learn from the feature input but instead, it starts remembering the entire data destroying the model’s generalization capability. The last Dense layer will contain only one node which corresponds to only one real-valued output of the regression task.
model.add(Dense(units = 32, input_dim = 4, activation = 'relu'))
model.add(Dropout(0.10))
model.add(Dense(units = 32, activation = 'relu'))
model.add(Dropout(0.20))
model.add(Dense(units = 32, activation = 'relu'))
model.add(Dropout(0.20))
model.add(Dense(units = 1, activation = 'linear'))
In the next five lines, we compile the created model with a few parameters like losses, optimizers, etc. Let’s try to briefly understand what some of these terms mean.
model.compile(loss='mse', optimizer='adam', metrics = ['mae'])
model.fit(X, Y, epochs = 100, verbose = 0)
I would still press on reading this article thoroughly for a better understanding of all of these important ML terminologies.
That’s it. Our model is ready to make predictions. In the last part of this code, we create yet another blob of random data, one that the model has not been trained on, and test our small yet powerful model on it to see how it performs.
Xnew, a = make_regression(n_samples=3, n_features=4, noise=0.1, random_state=1)
Xnew = scaled_X.transform(Xnew)
ynew = model.predict(Xnew)
for i in range(len(Xnew)):
print("X =",Xnew[i], ",", "Predicted =",ynew[i])
Output: some image here some image here
A Classification problem is about categorizing observations into discrete classes. Unlike Regression, where the output is a real value, Classification outputs have distinct probabilistic categorizations such as ‘Blue/Red’, ‘Yes/No, ‘Male/Female’ etc. One great example to understand Classification would be classifying mail into ‘Spam/Not Spam’. Binary classification is thus classifying observations into two discrete classes. In traditional Machine Learning, there are a lot of good algorithms to perform classification task eg. Support Vector Classification, K Nearest Neighbor, Logistic Regression, Decision Tree, Random Forest, etc. Here we will try to solve it with feature-based Deep Learning using ANN.
Breaking down the code:
As one can already infer, this piece of program is quite similar to that of Regression. We first import all the necessary Libraries.
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, Dropout
s
df = pd.read_csv('pima-indians-diabetes.csv')
X = df.iloc[:,0:8].values
Y = df.iloc[:,8].values
Next up we initiate the model instance and add in 3 layers of 32 and 64 nodes along with Dropout method. The last layer will be the output layer with the number of units totaling the number of desired outputs.
classifier = Sequential()
classifier.add(Dense(units = 32, activation = 'relu', input_dim = 8))
classifier.add(Dropout(0.10))
classifier.add(Dense(units = 64, activation = 'relu'))
classifier.add(Dropout(0.20))
classifier.add(Dense(units = 64, activation = 'relu'))
classifier.add(Dropout(0.10))
classifier.add(Dense(units = 1, activation = 'sigmoid'))
We then compile the model with a ‘Binary Corssentropy’ loss and the ‘Adam’ optimizer. Fitting the model onto the training dataset will be the next step.
classifier.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
classifier.fit(X, Y, batch_size=30, epochs=200, verbose=1)
For evaluating our model we will invoke the ‘.evaluate’ method from Keras and output the resulting scores of our trained model’s accuracy and losses.
score = classifier.evaluate(X, Y, verbose=0)
print ('test_loss:', score[0])
print ('test_acc:', score[1])
Output: some image here
A multi-class classification problem is exactly the same as a binary classification problem with the only difference being the number of observational classes. A binary classification model categorizes the output into two distinct classes. A multi-class classification model can categorize the output into many more classes. A good example to understand this would be categorizing a given fruit into one of the 5 different classes namely: ‘Apple’, ‘Orange’, ‘Banana’, ‘Kiwi’ or ‘Melon’. The algorithm can decide to categorize the fruit into one of the 5 categories based on its features like ‘shape’, ‘color’, ‘size’, etc. Below is an example of how Deep Learning will try to solve this type of classification problem using ANN.
Breaking down the code:
The first and foremost part of any Python program is generally to import all the necessary libraries. From lines 1–5 we do this same thing.
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.preprocessing.text import Tokenizer
from keras.utils.np_utils import to_categorical
from keras.datasets import reuters
Now, this particular dataset will require a little bit of text pre-processing before it can be fed to our model. The only preprocessing required here will be to convert the input data to a tokenized form and then to categorical form. This is done because we want our output in a multi-class classification format.
num_features = 5000
(train_x, train_y), (test_x, test_y) = reuters.load_data(num_words = num_features)
tokenizer = Tokenizer(num_words = num_features)
train_x = tokenizer.sequences_to_matrix(train_x, mode = 'binary')
test_x = tokenizer.sequences_to_matrix(test_x, mode = 'binary')
train_y = to_categorical(train_y)
test_y = to_categorical(test_y)
On line 29 we Initiate our model instance. From lines 31–54 we create our model. We add a few layers of Densely connected networks along with the dropout method from Keras to avoid overfitting. In line 57 we add in our last layer of output. Notice that the number of nodes(units) is remarkably more than that of the previous problems. This is where the name “Multi-Class” comes from. We try to segregate the input data into one of the following 46 probabilistic output scenarios.
model = Sequential()
model.add(Dense(units = 32, input_shape = (num_features, ),activation = 'relu'))
model.add(Dropout(0.15))
model.add(Dense(units = 32, activation = 'relu'))
model.add(Dropout(0.15))
model.add(Dense(units = 32, activation = 'relu'))
model.add(Dropout(0.15))
model.add(Dense(units = 32, activation = 'relu'))
model.add(Dropout(0.15))
model.add(Dense(units = 32, activation = 'relu'))
model.add(Dropout(0.15))
model.add(Dense(units = 46, activation = 'softmax'))
Finally. We compile our model with loss as ‘Categorical Crossentropy’ instead of ‘Binary Crossentropy’. This change is done because our data is categorical in nature. Our next step will be to fit the data on the training features and training labels for as long as 50 iterations. This will output the final result.
model.compile(loss = 'categorical_crossentropy', optimizer = 'rmsprop', metrics = ['accuracy'])
model.fit(train_x, train_y, batch_size = 100, epochs = 50, validation_data = (test_x, test_y), verbose = 1)
Output: some image here
That’s it, folks. Pat yourself on the back. You made it through the entire article.
Summary of what we did:
You can find a well commented and structured code along with reference notes on my Github. Make sure to check it out.
Ciao Adios ! Until next time. !!!