Machine Learning By Andew Ng - Week 5

Jun 8, 2020

Cost Function and Backpropagation

Let’s first define a few variables that we will need to use:
- L = total number of layers in the network
- s_l = number of units (not counting bias unit) in layer l
- K = number of output units/classes

NN Classification.png

We have added a few nested summations to account for our multiple output nodes.
In the first part of the equation, before the square brackets, we have an additional nested summation that loops through the number of output nodes.
In the regularization part, after the square brackets, we must account for multiple theta matrices.
The number of columns in our current theta matrix is equal to the number of nodes in our current layer (including the bias unit).
The number of rows in our current theta matrix is equal to the number of nodes in the next layer (excluding the bias unit).
As before with logistic regression, we square every term.

NN CF.png

Note
- the double sum simply adds up the logistic regression costs calculated for each cell in the output layer
- the triple sum simply adds up the squares of all the individual Θs in the entire network.
- the i in the triple sum does not refer to training example i

“Backpropagation” is neural-network terminology for minimizing our cost function, just like what we were doing with gradient descent in logistic and linear regression. Our goal is to compute: min J ( theta )
That is, we want to minimize our cost function J using an optimal set of parameters in theta.
To compute the partial derivative of J(Θ):
- Backpropagation Algorithm is used
One training example

Backpropagation Algorithm.png

Backpropagation Algorithm 1.png

Forward Propagation
Backward Propagation
- The delta values are actually the derivative of the cost function
- Recall that our derivative is the slope of a line tangent to the cost function, so the steeper the slope the more incorrect we are.

Advance Optimisation
- It needs theta to be in vectors
Example
- For efficient FP and BP values are expected in matrices and efficient Cost Function are expected in vectors
- Unrolling matrices into vectors in Octave
Learning Algorithm
- Process of unrolling
Octave Snippets
- Matrices → Vectors