Multiclass Classification using Deep Neural Networks (p-3)

5 min readJun 20, 2021

We are going to discuss Multi-Class classification topics used in deep neural networks and how to implement them. Topics discussed in this page are Softmax function, Multiclass Cross entropy, One hot encoding and multi-class model.

This page is a continuation of deep neural networks basics that I discussed. To clearly understand the up coming concepts its better to visit the previous topics. If you did visit them you can continue with this page.

Code for Multiclass classification example that we are going to discuss below can be found in Git-hub.

Recall:

As of right now, we have effectively discussed and coded the theory behind the basic perceptron model and a more complex neural network

Models designed in our previous discussions are

However both of these networks were similar in the sense that they were designed to separate a dataset containing only two labels 0 or 1. While the perceptron model dealt with linearly separable data, deep neural network dealt with separation of a more complex dataset.

Multiclass classification:

In this section, we will be taking one step further by discussing multiclass datasets and theory associated with it. We will then implement this into code by separating our data into three classes as below.

Multiclass dataset is a dataset that contains more than 2 classes.

The first crucial difference we face between dealing with binary and Multiclass datasets is the replacement of sigmoid activation function. As discussed earlier in page 1, sigmoid function is a very useful tool to classify binary datasets. It ranges in probabilities between 0 and 1. Takes the form of the following equation, shown in the image below, which converts the variable x the score of a point into a probability.

However it is not feasible to use this sigmoid function for multiclass classification. So we introduce a activation function that is capable of dealing with multi-class data , that is the “Softmax function”.

SoftMax activation function:

Lets consider an example data of 3 different sports balls, as shown below.

For the neural network, we weight, texture, color, size as inputs to identify a ball class. After these inputs are fed into the network they are manipulated by the weights and bias values of the connection b/w the input and the output layers. Notice that there are three output nodes in the image below, as we are dealing with three class data.

deep neural network for the 3-class example above

Note that, “The relative magnitudes of the scores must be maintained” and “All probability values must add up to 1”. These conditions are satisfied using softmax activation funciton. The equation for softmax function is shown below.

Recall that to make predictions on newly inputted data we need to train the neural network. Since we are dealing with a supervised learning algorithm we are dealing with a previously labeled data. We considered dataset with 3 sports balls. So how do we label the data?. If you recall in case of binary data we labeled the data as either 1 or 0. You might think that you can label 3 data classes as 0, 1, and 2, this creates dependencies b/w the classes. So, for Multi class data we use “One hot encoding.”

One Hot Encoding:

It allows us to classify all our classes without assuming dependencies b/w the classes. It works by creating separate columns for each data label in the dataset and uses appropriate one or zero value for each column to identify the data class for our data. So that they are linearly independent.

This can be done for any number of classes using appropriate number of columns.

Next, we apply Cross Entropy to find the total error of a model.

Multiclass Cross Entropy:

Also known as Categorical Cross Entropy.

It is an important concept that we discussed in our previous page on basics of Deep neural networks. It allows us to distinguish b/w a good model and a bad model. Cross entropy is method of measuring error with any neural network and consequently a lower cross entropy value a more accurate system while the higher entropy value implies a less accurate system. Previously used cross entropy for binary dataset is as follows

Binary Cross Entropy formula

Same concepts applies for multi-class cross entropy. Lets consider our example as shown below.

we need to calculate the total error using multiclass cross entropy for the above data.

We take the natural logarithm of the probability of each data sample being what it actually is. Our first ball which is already labeled as soccer ball, its probability of actually being a soccer ball according to our neural network is 0.4. So we’ll take the log of that.

Log of probability of ball being soccer ball

Similarly, we take the log of probabilities of other two balls and add them and take the negative, to obtain the total error.

Lets see the mathematical representation of Multi-class or Categorical-class entropy

Congrats, you got all the theory you needed. Lets dive into the code now.
Example code to implement multiclass classification is provided in Git-hub
Now that you got the necessary concepts, let us go ahead and start implementing a Digit Recognition Neural Network project using DNN.