Hand Written Digit Recognition using Deep Neural Networks with MNIST dataset (p-4)

9 min readJun 20, 2021

A program to identify handwritten digits from 0 to 9 using basic deep neural networks. We will discuss how to implement this model.

Software Tools used in this program:

Python, Jupyter Notebook, Keras , Tensorflow, MNIST dataset.

Code for this project Hand and Written Digit Recognition using Deep Neural Networks can be found in Git-hub.
As digit recognition is a Multi-class classification model, I recommend you have a look a at Multi-class classification neural network. If you are familiar with it, you can start the implementation.

We will be seeing this digit recognition problem using regular deep learning to better understand why we use convolutional neural networks for image classifications(CNN). As CNN is the most commonly used network for image classification.

For Convolutional Neural Network click here

Steps involved:

Loading train and test images from MNIST first.
Display them on to a grid so that we can have a chance to witness the variety of digits in each class.
Prepare our data for training by reshaping it into specific format and then normalizing it.
Then we start training the data. “Remember that the training data set is used to train the neural network to obtain the appropriate parameters.”

Importing Libraries:

Make sure to install tensorflow and keras
Sequential allows to define neural model
Dense allows us to connect preceding layers in the network to subsequent layers creating a fully connected layer network
Adam optimizer as we are dealing with multi-class classification.
to_categorical for One hot encoding. Required for multi-class classification

# importsimport numpy as np
import matplotlib.pyplot as plt
import tensorflow.keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense  
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
import random

To get same data whenever called we use random-seed function

np.random.seed(0)

Importing data from MINST Dataset:

Keras provides the required MNIST dataset
We import training data and testing data. where training data is to obtain the parameters and test data is to evaluate the performance of the neural network.
mnist.load_data imports 60000 images with labels into training data and 10000 images into testing data.

Each image in the dataset is 28px wide and 28px high, that is each image has 784 pixels.

(X_train, y_train), (X_test, y_test) = mnist.load_data()

Remember that the number of input images (x_train) must be equal to number of labels (y_train) otherwise our model will not be trained properly.

# see the shapes of our data imported from MNIST dataset.
print(X_train.shape)
print(X_test.shape)
print(y_train.shape[0])output:
(60000, 28, 28)
(10000, 28, 28)
60000

The input data must satisfy some conditions before training, those conditions are:

The number of images must be equal to the number of labels in training and testing datasets
The dimensions of the training and testing images must be 28*28

To make sure that these conditions are met we use a common function called assert function.

Assert function:

It takes in a single argument, the argument is just a condition that is either True or False.
If the condition is true then the code runs smoothly otherwise prints a string.
Using this function is a good practice as it helps debug a complex problem.

assert(X_train.shape[0] == y_train.shape[0]), "The number of images is not equal to the number of labels."assert(X_test.shape[0] == y_test.shape[0]), "The number of images is not equal to the number of labels."assert(X_train.shape[1:] == (28,28)), "The dimensions of the images are not 28x28"assert(X_test.shape[1:] == (28,28)), "The dimensions of the images are not 28x28"output:
If any condition is false then it outputs the string associated with it otherwise there will be no output.

Visualize the Data:

To visualize the number of images in each class (from 0 to 9).

First we will be creating an array to store the number of images in each of our ten classes (0 to 9)

num_of_samples = []

We’ll create a grid arrangement to help us visualize. Our grid will contain 10 rows, 0 to 9, in each row we’ll have 5 columns of images

cols = 5
num_classes = 10

We use a subplot, as subplots allow you to display multiple plots on the same figure. It also returns tuples which contains 2 values, an instance of our figure and plot axis.

fig, axs = plt.subplots(nrows=num_classes, ncols = cols, figsize=(5, 10))fig.tight_layout()  # To avoid overlapping of plots

We are going to loop through every single column and for each column iteration we will iterate through every row or in this case every class we do this by creating a nested for loop arrangement that cycles through our data and counts it up.

for i in range(cols):
    for j in range(num_classes):
        x_selected = X_train[y_train == j]
        axs[j][i].imshow(x_selected[random.randint(0, len(x_selected - 1)), :, :], cmap=plt.get_cmap("gray"))
        axs[j][i].axis("off") # To remove axis
        if i == 2:
            axs[j][i].set_title(str(j))
            num_of_samples.append(len(x_selected))

Lets have a look at the number of samples now

# shows the no.of images belonging to each class
print("No.of Samples:", num_of_samples)# Lets visualize this with bar plots
plt.figure(figsize=(12, 4))
plt.bar(range(0, num_classes), num_of_samples)
plt.title("Distribution of the training dataset")
plt.xlabel("Class number")
plt.ylabel("Number of images")

output:

By observing the above plot we now have a clear idea about the input dataset.

Preparing our data to use it in training:

First perform One hot encoding on train and test data, which is necessary for multi class classification.

# (labels to encode, total no.of classes)
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

Normalize the data
We choose to divide by 255 because we want to normalize our data to be in a range between 0 and 1.
This ensures that the max pixel value 255 is normalized down to the max value of 1.

This normalization process is important as it scales down our features to a uniform range and decreases variance among our data. We need to ensure that our data has low variance. Helps the model to learn more clearly and accurately.

X_train = X_train/255 
X_test = X_test/255

Flattening the data

num_pixels = 784
X_train = X_train.reshape(X_train.shape[0], num_pixels)
X_test = X_test.reshape(X_test.shape[0], num_pixels)
print("training data", X_train.shape)
print("testing data", X_test.shape)output:
training data (60000, 784)
testing data (10000, 784)

Creating the Model:

using model.add we can add any number of hidden layers.
As you know we must now learn to classify b/w various hand written digits and this is best done with a neural network with many nodes in the hidden layers. The more nodes we have the more they are trained to combine into one another with various weights and bias values which will eventually form more complex models that will classify our hand written digits with some accuracy.
We use softmax activation function and Multicross entropy with a learning rate of 0.01. relu is a non-linear activation function. we’ll discuss about this in convolution networks post.

def create_model():
    model = Sequential()
    model.add(Dense(10, input_dim=num_pixels, activation='relu'))
    model.add(Dense(30, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(num_classes, activation='softmax'))
    model.compile(Adam(learning_rate=0.01), loss='categorical_crossentropy', metrics=['accuracy'])
    return modelmodel = create_model()
print(model.summary())

output:

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_15 (Dense)             (None, 10)                7850      
_________________________________________________________________
dense_16 (Dense)             (None, 30)                330       
_________________________________________________________________
dense_17 (Dense)             (None, 10)                310       
_________________________________________________________________
dense_18 (Dense)             (None, 10)                110       
=================================================================
Total params: 8,600
Trainable params: 8,600
Non-trainable params: 0
_________________________________________________________________
None

We split our training data into training and validation sets:

where training set is used to tune the weights and bias
validation set is used to tune the hyper parameters.

history = model.fit(X_train, y_train, validation_split=0.1, epochs = 30, batch_size = 200, verbose = 1, shuffle = 1)

Overfitting example:

If you visualize the plot with different epochs. Have a look at the below example,

you can see that the validation error becomes higher than the training error. which means its doing a good job on training it-self on training data but performs pretty poorly on the validation set at some point.

Visualize the Accuracy and Error in our model:

Plot to observe the training error and validation error. By using this plot we can change the number of epochs and find the model that best suits our problem. Remember not to overfit or underfit your data.

plt.plot(history.history[‘loss’])
plt.plot(history.history[‘val_loss’])
plt.legend([‘loss’, ‘val_loss’])
plt.title(‘Loss’)
plt.xlabel(‘epoch’)Output:

Now lets have a plot of training accuracy v/s validation accuracy

plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.legend(['accuracy', 'val_accuracy'])
plt.title('Accuracy')
plt.xlabel('epoch')

Accuracy of training v/s validation data

Testing out data using test dataset:

score = model.evaluate(X_test, y_test, verbose=0)
print(type(score))
print('Test score:', score[0])
print('Test accuracy:', score[1])output:
<class 'list'>
Test score: 0.3001714050769806
Test accuracy: 0.9101999998092651

Testing our model on new external image:

import requests
from PIL import Imageurl = 'https://www.researchgate.net/profile/Jose_Sempere/publication/221258631/figure/fig1/AS:305526891139075@1449854695342/Handwritten-digit-2.png'
response = requests.get(url, stream=True)
img = Image.open(response.raw)
plt.imshow(img, cmap=plt.get_cmap('gray')output:

We need to make the above image into numpy array so that we can modify the image into a 28*28 image which is the required input. Our neural network is trained on 28*28 image with black background and white pixels. So we convert the above image into required format.

import cv2array_img = np.asarray(img)
resized_img = cv2.resize(array_img, (28, 28))
print("resized image shape:", resized_img.shape)
gray_img = cv2.cvtColor(resized_img, cv2.COLOR_BGR2GRAY)
print("Grayscale image shape:", gray_img.shape)
image = cv2.bitwise_not(gray_img)
plt.imshow(image, cmap=plt.get_cmap('gray'))Output:

image = image/255
image = image.reshape(1, 784)prediction = np.argmax(model.predict(image), axis=-1)
print("predicted digit:", str(prediction))output:
predicted digit: [2]

Even after adding a hidden layer with 30 nodes our accuracy is low. So we should be using Convolutional neural networks.

Note:

Image Classification is best done with a convolution neural network. We will look at convolutional neural networks in the next post but for now let’s try and classify the images with a regular deep neural network and see the results for better understanding.

Keep in mind that the images from the MNIST dataset have relatively small 28 by 28 dimensionality of normalized values.

And since they are grayscale images we only have a single channel of pixel intensity values. In this case the images that we are dealing with are indeed manageable by feedforward neural network. However, what if we are dealing with RGB images? 72pixels wide by 72 pixels high.

which corresponds to 5184 pixel intensity values. And since we are dealing with a 3 channel colored image multiplying that by no.of channels we get 15552 weights parameterizing each node in the hidden layer and taking into account how many inputs we have and computational complexity we would have to increase the number of hidden layers and neurons in each layer as such we just wouldn’t have enough computational power to train the size of neural network.

This is why we use Convolutional Neural Networks which makes processing more computationally manageable. we will take about these in next post, for now lets have a look at deep neural network.

Congrats on coming this far. We use Convolutional Neural Networks for Image classifications instead of regular deep neural networks, as it provides more accurate results.
For Code of Digit Recognition using Regular deep neural network click here
To learn Convolutional Neural Network click here