LeNet-5: the foundation stone of Convolutional Neural Networks

Shubham singh
3 min readNov 24, 2020
Convolutional Neural network

Computer vision is an artificial neural network in which a computer can learn to recognize images and perform object detection. In the 1950s, extensive research on artificial neural networks began. In 1959, David Hubble and Torsten Wiesel pointed out that the human visual cortex is primarily composed of simple cells and complex cells. The simple cell responds to the edges of the orientation, on the other hand, the complex cell is different from the simple cell in the sense that even if we shift the bars and edges of the image, the complex cell still responds.

In 1980, Kunihiko Fukushima proposed a mathematical model called Neocognitron to recognize handwritten characters. The model is based on the findings of Hubble and Wiesel. It consists of many types of cells, but the most important of them are S cells and C cells. The local features are extracted by S-cells, and these features’ deformation, such as local shifts, are tolerated by C-cells. The model was such that the first layer’s S-cell was connected to the second layer’s C-cell.

LeNet-5

LeNet-5 is a convolutional neural network released in 1998 by Yann LeCun, Yoshua Bengio, Patrick Haffner, and Leon Bottou. It was among the earliest CNNs to be used for digit recognition. Although LeNet-5 is a basic network, the inspiration for modern convolutional neural networks is derived from it. As you may know, the network was not so popular at the time because of limited computing power and high-speed computer functions are necessary for training deep learning models. The first use of this model was to recognize handwritten digits in the MNIST dataset and commercially, it was used to read handwritten digits on bank cheques.

Model Architecture

The LeNet model has 60k parameters and 7 layers that are connected in such a way that the first layer is the convolutional layer, the second layer is the average pooling layer, once again the third layer is the convolutional layer and then the average pooling layer, the fifth layer is again a convolutional layer and finally the sixth and seventh are fully connected layers. We will cover all these layers one by one. The images are normalized before fed into the network.

LeNet-5 architecture

In the first layer, the 32x32 image is convolved with 6 filters with 5x5 convolution kernels. And from here, the output of 28x28x6 comes out. In all convolutional layers, the activation function is Tanh.

The second layer is a 2x2 sub-sampling layer or average pooling layer, and its output is 6x14x14. This layer is used to reduce dimensionality and to avoid the problem of overfitting.

The third layer is also a convolutional layer, where the output of the previous layer is convolved with 16 5x5 convolution kernel filters, and then a 16x10x10 output is obtained from it.

The fourth layer In this layer, the average pooling reduces the dimensionality once again and the output of 16x5x5 comes out.

The sixth layer, when the 5x5 convolution kernel of 120 filters is convolved with the output of the previous layer, the output of 120x1x1 will appear.

The sixth layer is a fully-connected layer with 84 dense layers.

The seventh layer, the last layer is an output layer with 10 dense layers with a softmax activation function.

A practical example of LeNet-5

--

--