Back Propagation


Back Propagation

Back Propagation, few people refer it as a magic box, is one of the most frequently used word in Neural Network communities and said that it holds the most difficult and complex math behind it.
  • What actually is Back Propagation ? 
  • What is the math under-hood ? 
  • Why it is so difficult yet important to know the concept ? 
In this article I'll take you through the path of Back Propagation and how it works between our Neural Layers.
(If you are reading this I'll take granted that you know the basics of Deep Learning and Calculus)

OK. Let's dive into concept...

Let us first define a simple Neural Network.
     To make and look thing very simple here, I consider the Neural Network of Logistic Regression with a single set of Input values at Input Layer, a Hidden Layer and expected a single Output value at an Output Layer.


Source: https://www.simonho.ca/wp-content/uploads/2018/02/ann.jpg

We have an input vector X with dim(n), and known output vector y of dim(1). For a given input X, our network need to find the outcome either 0 or 1.

is the probability of y being 1 for a given input X.

How a Neural Network work? (Only Math)

Given an input  to our input layer, the values are multiplied by randomly initialized weights (by our network), say W and a bias, say b is added to the obtained product. The resultant, say Z is then passed from input layer to the hidden layer. The hidden layer considers Z values as input and an activation function is applied on it. The result, say a, are again multiplied by randomly initialized second set of weights W2 and then second set of bias b2 is added to the product. The result say Z2, is then passed to Output Layer of our Neural Network. Here one more activation function (sigmoid) is applied on Z2 and the result becomes Output of our Neural Network.
Let's put all the equations together:



OK. wait, who is that uninvited hero doing show-off in the last line of our equations ?

Well, that is a Loss/Error function, the only useful thing for our Back Propagation process. While training our input data into Neural Network, we know the output of training value as y but our Neural Network got it as = a, The difference between the known output and the Network obtained output is called as Loss/Error. The main task of our Network is to minimize that Cost/Error. That's where our hero Back Propagation enter into the picture. The process till now we saw i.e, from passing inputs at input layer to getting the output at output layer and Loss calculation is called Foreword Pass.

What actually is Back Propagation ?

Back Propagation is a process where we go backwards in a Neural Network from output layer to hidden layers to find the rate of change in Loss function with respect to change in our weights and bias and update them at a factor of learning rate to make our Loss function minimum.
That is in pure mathematical terms. Let's look at the math of above said statements.
We have seen an uninvited hero while writing math equations of forward pass. Let's invite him now: 

That is the Loss function for a single input X.
(Don't bother about the equation, after-all he is just doing show-off) 
Now our duty is to check how our Loss function is reacting to the small changes in our weights, W, W2 and bias, b, b2 also called as it's derivatives.
Remember it is a Back Propagation, we go from back to front. So we first find the reaction of loss function, for a small change in a2.

reaction of loss function, for a small change in Z2.

reaction of loss function, for a small change in W2.

reaction of loss function, for a small change in b2.

reaction of loss function, for a small change in Z1.

reaction of loss function, for a small change in W1.

reaction of loss function, for a small change in b1.

Updating Weights and Bias:
Useful derivatives among the above are W, b. Remember we are updating only Weights, W1, W2 and bias, b1, b2.



What is How our Loss function is going to reach it's minimum value is a whole new concept called Gradient Descent.

Forward Pass Again:

After updating weights and bias, these updated values are then used as input values for input layer, then Neural Network runs and calculates loss and update them again. This process repeats till our loss function reaches it's minimum value.













Comments

Post a Comment