Deep Learning Simplified

Encouraged by all the responses to my previous “Simplified” blog series on Reinforcement learning and Ensemble learning, I am writing this blog covering Deep learning basics in a step-by-step manner.

The primary aim of this blog is to enforce mastering the neural networks and related deep learning techniques conceptually. With the help of a complex pattern recognition problem, this blog covers the procedure to develop a typical neural network, which you will be able to use to solve a problem of similar complexity.

The first part of this blog series will introduce to you what deep learning is in simple terms. The following representation shows all the learning methods covered in this book, highlighting the primary subject of learning in this blog — Deep learning.

Let’s first recap the premise of Machine learning and reinforce the purpose and context of learning methods. As we learned, Machine learning is about training machines by building models using observational data, against directly writing specific instructions that define the model for the data to address a particular classification or a prediction problem. The word model is nothing but a system in this context.

The program or system is built using data and hence, looks as though it’s very different from a hand-written one. If the data changes, the program also adapts to it for the next level of training on the new data. So, all it needs is the ability to process large-scale as against getting a skilled programmer to write for all the conditions that could still prove to be massively erroneous. Some examples include recognizing patterns such as speech recognition, object recognition, face detection, and more.

Deep learning is a type of Machine learning that attempts to learn prominent features from the given data and thus, tries to reduce the task of building a feature extractor for every category of data (for example, image, voice, and so on.). For a face detection requirement, a deep learning algorithm records or learns features such as the length of the nose, the distance between eyes, the color of eyeballs, and so on. This data is used to address a classification or a prediction problem and is evidently very different from the traditional shallow learning algorithm.

The following concept model covers different areas of deep learning and the scope of topics covered in this blog.

Let’s take an example; if we were to predict whether a visitor to the restaurant would come back based on two factors — one is the amount of bill (x1), and the other is his/her age(x2). When we collect data for a specific duration of time and analyze it for an output value that can be 1(in case the visitor came back) or -1(if the visitor has not come back). The data, when plotted, can take any form — from a linear relationship or any other complex structure, as shown here:

Something like a linear relationship looks straightforward, and more complex relationships complicate the dynamics of the model. Can parameter θ have an optimal value at all? We might have to apply optimization techniques, and in the next sections to follow, we will cover these techniques, such as perceptrons and gradient descent methods, among others. If we want to develop a program to do this, we need to know what our brain does to recognize these digits, and even if we knew, these programs might be very involved.

Neural Networks

Neural computations have been of the primary interest of the study to understand how parallel computations work in neurons (the concept of flexible connections) and solve practical problems as a human brain does. Let’s now look at the fundamental core unit of the human brain — the neuron:

Neuron

The human brain is all about neurons and connections. A neuron is the smallest part of the brain, and if we take a tiny rice grain-sized piece of the brain, it is known to contain at least 10000 neurons. Every neuron, on average, has around 6000 connections with other neurons. If we look at the general structure of a neuron, it appears as follows.

Every feeling that we humans go through, be it thought or emotion is because of these millions of cells in our brain called neurons. As a result of these neurons communicating with each other by passing messages, humans feel, act, and form perceptions. The diagram here depicts the biological neural structure and its parts:

Every neuron has a central cell body; as any cell, in general, it has an axon and a dendritic tree that are responsible for sending and receiving messages respectively with other neurons. The place where axons connect to the dendritic tree is called Synapse. The synapses themselves have an unusual structure. They contain transmitter molecules that trigger transmission, which can either be positive or negative.

The inputs to the neurons are aggregated, and when they exceed the threshold, an electrical spike is transmitted to the next neuron.

Synapses

The following diagram depicts the model of a synapse illustrating the flow of messages from axon to the dendrite. The job of the Synapse is not just the transmission of messages, but in fact, adapt themselves to the flow of signals and have the ability to learn from the past activities.

As an analogy in the field of Machine learning, the strength of the incoming connection is determined based on how often it is used, and thus determines the impact on the neuron. This is how new concepts are learned by humans subconsciously.

There can additionally be external factors such as medication or body chemistry that might impact this learning process.

As a conclusion to this part of the blog on deep learning, let us summarize how exactly learning happens in the brain.

Neurons communicate with other neurons or sometimes receptors. Cortical neurons use spikes for communication.
The strengths of connections between neurons can change. They can take positive or negative values by either establishing and removing connections between neurons or by strengthening the relationship based on the influence that a neuron can have over the other. A process called long-term potentiation (LTP) occurs that results in this long-term impact.
There are about 1011 neurons having weights that make the computations that the human brain can do more efficiently than a workstation.
Finally, the brain is modular; different parts of the cortex are responsible for doing different things. Some tasks infuse more blood flow in some regions over the other and thus, ensuring different results.

In this section of the blog, we will cover more on ANN (Artificial Neural Networks), starting with understanding what artificial neurons/ perceptrons followed by all the algorithms classified under deep learning.

Artificial neurons or perceptrons

It is evident that artificial neurons draw inspiration from biological neurons, as represented previously. The features of an artificial neuron are listed here:

There is a set of inputs received from other neurons that activate the neuron in context
There is an output transmitter that transfers signals or activation of the other neurons
Finally, the core processing unit is responsible for producing output activations from the input activations

Idealizing for a neuron is a process that is applied to building models. In short, it is a simplification process. Once simplified, it is possible to use mathematics and relate analogies. In this case, we can easily add complexities and make the model robust under identified conditions. Necessary care needs to be taken in ensuring that none of the significant contributing aspects are removed as a part of the simplification process.

An example

A face recognition case using the multi-layered perceptron approach is shown next:

Multiple layers take this image as input, and finally, a classifier definition is created and stored. Given a photograph, each segment focuses on learning a specific part of the photograph and finally store the output pixels.

Some critical notes on the weights and error measure are as follows:

The training data is the source of learning the weights of neurons.
The error measure or the cost function is different from the regression and classification problems. For classification, log functions are applied, and for regression, least-square measures are used.
These methods help to keep these error measures under check by updating the weights using convex optimization techniques such as decent gradient methods.

Backpropagation algorithm

Taking forward the topic of training the networks, the Gradient descent algorithm helps neural networks to learn the weights and biases. Moreover, to compute the gradient of the cost function, we use an algorithm called backpropagation. Backpropagation was first discussed in the 1970s and became more prominent regarding its application only in the 1980s. It was proven that neural network learning was much faster when the backpropagation algorithm was employed.

In the earlier sections of this chapter, we saw how a matrix-based algorithm works; a similar notation is used for the backpropagations algorithm. For a given weight w and bias b, the cost function C has two partial derivatives, which are ∂C/∂w and ∂C/∂b.

Some critical assumptions regarding the cost function for backpropagation are stated here. Let’s assume that the cost function is defined by the equation here:

Following model lists different types of neural networks:

A deeper insight into each of these types of neural networks will be covered in the next blog. Happy reading!

For more details refer to my publications page.