Hand written Digit Recognition using Convolutional Neural Networks(CNN) on MNIST dataset (p-6)

Venkata Naveen Varma V
10 min readJun 21, 2021

--

We will be discussing how to implement a Convolutional Neural Network(CNN) model to recognize digits from MNIST dataset. Topics involve, Importing data, Visualizing, Preprocessing, Training , leNet model, model building, testing…

  • If you are new to or unfamiliar with CNN, I recommend u visit CNN basics. If you are interested you can visit DNN based digit recognition model which helps you understand the difference b/w using DNN and CNN for image classification.

Remainder:

All neural networks are referred as Artificial Neural Networks(ANN). Neural networks with more than one hidden layer are called Deep Neural Networks(DNN). Convolutional Neural Networks(CNN) are mainly used in image processing.

To learn about DNN click here
To learn about CNN click here
Code for digit recognition using CNN can be found in Git-hub.

Implementation:

Software Tools used:

  • Python, Google Colab, Keras.

Lets start the project now, So what are the steps involved? you ask

  • We will load the training and testing images from MNIST first.
  • Then display them on to a grid so that we can have a chance to witness the variety of digits in each class.
  • After that we’ll need to prepare our data for training by reshaping it into specific format and then normalizing it.
  • Then we start training the data.
    “Remember that the training data set is used to train the neural network to obtain the appropriate parameters.”

1. Imports:

First, we import all the required libraries. I recommend you use google colabs for this project for computational power, I used it.

# Imports
# Make sure to install tensorflow and keras, if you are running in local system. If your using google colab no need to install anything just run the code.
# Sequential allows to define nueral model
# Dense allows us to connect preceeding layers in the network to subsequent layers creating a fully connected layer network
# Adam optimizer as we are dealing with multi-class classification. So we must use one hot encodeing too
# to_categorical for One hot encoding. Required for multi-class classification
import numpy as np
import matplotlib.pyplot as plt
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
from keras.utils.np_utils import to_categorical
import random
rom keras.layers import Flatten # To flatten our data
from keras.layers.convolutional import Conv2D # for Convolutional layers
from keras.layers.convolutional import MaxPooling2D # for pooling layers
from keras.layers import Dropout
from keras.models import Model
# To get same data whenever callednp.random.seed(0)

Keras.dataset provides access to MNIST dataset which contatins 60,000 Hand written images with labels where each image is of dimension 28 by 28, that is each image has 784 pixels. We will be importing this dataset for our project using the code below.

# importing training data to obtain the parameters and test data to evaluate the performance of the neural network.
# mnist.load_data imports 60000 images with labels into training data and 10000 images into testing data.
# Each image in the dataset is 28px wide and 28px height i.e each image has 784 pixels.
(X_train, y_train), (X_test, y_test) = mnist.load_data()print(X_train.shape)
print(X_test.shape)
print(y_train.shape[0]) # no.of labels
OUTPUT:
(60000, 28, 28)
(10000, 28, 28)
60000

As you can see from the output above, our dataset contains 60,000 training images, 10,000 testing images with 28 by 28 dimensionalities.

Assert function:

  • assert func takes in a single argument, the argument is just a condition that is either True or False.
  • If the condition is true then the code runs smoothly otherwise print a string
  • Using this func is a good practice as it helps debug a complex problem
  • NOTE: The no.of training images must be equal to the no.of labels for consistency.

Using the assert function we will be making sure that some conditions are satisifed.

# Conditions to be satisfied:assert(X_train.shape[0] == y_train.shape[0]), "The number of images is not equal to the number of labels."
assert(X_test.shape[0] == y_test.shape[0]), "The number of images is not equal to the number of labels."
assert(X_train.shape[1:] == (28,28)), "The dimensions of the images are not 28x28"
assert(X_test.shape[1:] == (28,28)), "The dimensions of the images are not 28x28"

2. Visualization:

To get a better analyze we perform below steps on our data,

  • create array to record no.of images in each of our ten categories
  • we’ll create a grid arrangement to help us visualize. Our grid will contain 10 rows, 0 to 9, in each row we’ll have 5 columns of images
  • subplots allow you to display multiple plots on the same figure. It also returns tuples which contains 2 values, an instance of our figure and plot axis.
  • We are going to loop through every single column and for each column iteration we will iterate through every row or in this case every class we do this by creating a nested for loop arrangement that cycles through our data and counts it up.
  • random images from the dataset are shown to see how different the digits are in the same class.
  • Finally adding titles to each row like 0,1,2,3,….,9
# Visulalize the no.of images in each class (from 0 to 9)num_of_samples = []
cols = 5
num_classes = 10
fig, axs = plt.subplots(nrows=num_classes, ncols = cols, figsize=(5, 10))
fig.tight_layout() # To avoid overlapping of plots
for i in range(cols):
for j in range(num_classes):
x_selected = X_train[y_train == j]
axs[j][i].imshow(x_selected[random.randint(0,(len(x_selected) - 1)), :, :], cmap=plt.get_cmap('gray'))
axs[j][i].axis("off") # To remove axis if i == 2:
axs[j][i].set_title(str(j))
num_of_samples.append(len(x_selected))
OUTPUT:It outputs images in a 5*10 order that is 10 rows of images with 5 images in each row. where 10 rows represent 10 classes 0 to 9.

Now we use a bar graph

# shows the no.of images belonging to each classprint("No.of Samples:", num_of_samples)# Lets visualize this with bar plotsplt.figure(figsize=(12, 4))
plt.bar(range(0, num_classes), num_of_samples)
plt.title("Distribution of the training dataset")
plt.xlabel("Class number")
plt.ylabel("Number of images")
OUTPUT:

3. Preparing our data to use it in training

NOTE:

  • Previously, in DNN model, we flattened our image to give it as input, but here we are leaving the image as it is, that is 28*28 also add a depth of 1. With regular Neural Networks the image had to be flattened into a 1-d array of pixel intensities.
  • For CNN its different. First we add a depth, Why depth?, as CNN works by applying filter to the channels of the image that are being viewed, in case of gray-scale images there is one channel present, therefore our data must reflect the presence of the depth.
  • By adding this depth of 1 our data will be in the desired shape to be used as an input for the convolutional layer.
# adding depthX_train = X_train.reshape(60000, 28, 28, 1)
X_test = X_test.reshape(10000, 28, 28, 1)

We perform One hot encoding on our data. Which we already discussed in Multiclass classification post. If you are not familiar with it you can find it here

# First perform One hot encoding on train and test data, which is necessary for multi class classification.y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

Normalize the data:

We choose to divide by 255 because we want to normalize our data to be in a range between 0 and 1.

  • This ensures that the max pixel value 255 is normalized down to the max value of 1.
  • This normalization process is important as it scales down our features to a uniform range and decreases variance among our data. We need to ensure that our data has low variance. Helps to learn more clearly and accurately.
# Normalize the dataX_train = X_train/255 
X_test = X_test/255

4. Creating the model

  • There are predefined CNN Architectures like LeNet, AlexNet, ZFNet, GoogleNet etc
  • We will be designing a LeNet based architecture for digit recognition

LeNet model contains 2 convolutional layers and 2 pooling layers

We will be using dropout layer to reduce overfitting

  • This layer functions by randomly setting a fraction rate of input units to 0 at each update during training, which helps prevent overfitting. This process is implemented only on training data not on testing or new data. So while using new data all nodes are utilized to provide a more efficient result.
  • we will be using only 1 dropout layer. However, more than one dropout layer can be used to increase the performance.
  • Remember that these layers are mostly placed in between the layers that have a high number of parameters because the high parameter layers are more likely to overfit.
  • Dropout rate is the amount of input nodes the dropout layers drops during each update. where 0 indicates to drop 0 nodes and 1 indicates to drop all nodes, 0.5 is the recommended rate.

Steps involved in our model:
1. Convolution layer with 30 filters, play around by changing no.of filters. Each filter will have a 5*5 format of prameter values. input shape of 28*28 with depth of 1 and activation function used is relu. Stride is the number of increments the kernel shifts, by default stride=1 if not mentioned. Smaller stride retains more info as more convolutional operations are performed. Padding is used preserve the spacial dimentionality of the image but we are not using padding for the mnist dataset.

2. Pooling layer of size 2*2. Therefore grabbing a max value in a feature map within a 2*2 neighborhood. After pooling process the shape of our convoluted image is going to go from a 24*24*30 to a smaller shape of 12*12*30 because max pooling scales down every feature map into a smaller abstracted representation.

3. Since we are dealing with a more dense input for this layer we’re going to use fewer filters,i.e, 15 filters with each filter of size 3 by 3. Parameters increases demanding more computational power.
Flatten allows us to flatten our data inorder to format it properly for it to go in the fully connected network

4. Flatten layer will take our output data with a shape of 5 by 5 by 15 and reshape it into a 1D array, you can see that in the below output.

5. Fully Conencted layer is added using Dense wit no.of nodes = 500, this number can be adjusted as desired, a lower number can provide minimally less accuracy while higher number requires high computational power.

  • Dropout rate is the amount of input nodes the dropout layers drops during each update. where 0 indicates to drop 0 nodes and 1 indicates to drop all nodes, 0.5 is the recommended rate.
def leNet_model():
model = Sequential()
model.add(Conv2D(30, (5, 5), input_shape=(28, 28, 1),activation='relu')) # Note 1
model.add(MaxPooling2D(pool_size=(2,2))) # Note 2
model.add(Conv2D(15, (3, 3), activation='relu')) # Note 3
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten()) # Note 4
model.add(Dense(500, activation='relu')) # Note 5
model.add(Dropout(0.5)) # Have a look at the plots below and comment this dropout layer to see the change in the plots.
model.add(Dense(num_classes, activation='softmax')) # output layer with no.of nodes = no.of classes.
model.compile(Adam(learning_rate=0.01), loss="categorical_crossentropy", metrics=["accuracy"])
return model

Seeing the summary gives us an overview of our Convolutional model

model = leNet_model()
print(model.summary())
OUTPUT:Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 24, 24, 30) 780
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 12, 12, 30) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 10, 10, 15) 4065
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 15) 0
_________________________________________________________________
flatten (Flatten) (None, 375) 0
_________________________________________________________________
dense (Dense) (None, 500) 188000
_________________________________________________________________
dropout (Dropout) (None, 500) 0
_________________________________________________________________
dense_1 (Dense) (None, 10) 5010
=================================================================
Total params: 197,855
Trainable params: 197,855
Non-trainable params: 0
_________________________________________________________________
None

5. Training the Model:

Note: We split our training data into training and validation sets.

  • where training set is used to tune the weights and bias
  • validation set is used to tune the hyper parameters.

When ever the validation error is more than training error that indicates the start of our model overfitting

Train the model using model.fit. Remember that model.fit gives the history
Verbose to display the progress

history = model.fit(X_train, y_train, epochs = 10, validation_split=0.1, batch_size=400, verbose=1, shuffle=1)

Plot the Loss and accuracy graphs to observe the performance of our model on Training and Validation sets.

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.legend(['training', 'validation'])
plt.title('Loss')
plt.xlabel('epoch')
OUTPUT:
Loss plot
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.legend(['training','validation'])
plt.title('Accuracy')
plt.xlabel('epoch')
OUTPUT:
Accuracy plot

6. Evaluating our model on test data:

score = model.evaluate(X_test, y_test, verbose=0)
print(type(score))
print('Test score:', score[0])
print('Test accuracy:', score[1])
OUTPUT:
Test score: 0.050614792853593826
Test accuracy: 0.9873999953269958

We can clearly see that our model is much more accurate when compared to our DNN based Model. Now, Without a doubt you can see why CNN’s are used for image classification instead of DNN’s.

Lets try an external new image in our model to test how it works on completely new data.

# Testing our model on new external image
# url for number 2 https://www.researchgate.net/profile/Jose_Sempere/publication/221258631/figure/fig1/AS:305526891139075@1449854695342/Handwritten-digit-2.png
import requests
from PIL import Image
url = 'https://www.researchgate.net/profile/Jose_Sempere/publication/221258631/figure/fig1/AS:305526891139075@1449854695342/Handwritten-digit-2.png'
response = requests.get(url, stream=True)
img = Image.open(response.raw)
plt.imshow(img, cmap=plt.get_cmap('gray'))
output:
output
  • We need to make the above image into numpy array so that we can modify the image into a 28*28 image which is the required input.
  • Our neural network is trained on 28*28 image with black background and white pixels. So we convert the above image into required format.
import cv2array_img = np.asarray(img)
resized_img = cv2.resize(array_img, (28, 28))
print("resized image shape:", resized_img.shape)
gray_img = cv2.cvtColor(resized_img, cv2.COLOR_BGR2GRAY)
print("Grayscale image shape:", gray_img.shape)
image = cv2.bitwise_not(gray_img)
plt.imshow(image, cmap=plt.get_cmap('gray'))
OUTPUT:
Converted image as output

Normalize it,

image = image/255
image = image.reshape(1, 28, 28, 1)

Now perform prediciton and see the output,

prediction = np.argmax(model.predict(image), axis=-1)
print("predicted digit:", str(prediction))
OUTPUT:predicted digit: [2]

As you can see our prediction is 2 which is correct.

Congrats, on completing your project on Digit recognition using CNN.
Code for this project can be found in Git-hub

--

--