Maths and Formulas – Activation Functions, Derivatives, Errors, Weight Adjustments

Weight Adjustment

I promised distilled and no lengthy math :-). In this section i will start with the end and work towards the beginning. First we shall provide adjustment to weights done by back propagation, then unpack each formula and values working through error calculation, derivatives and activation, and lastly forward calculation.

Weigth {New value} = Weight {Old value} – AdjustBy;

Note – somewhere else i said Actual – Target and warned keep it this way else signs need to change. If you chose to do error calculation Target – Actual then there Adjust by needs to be multiplied by (-1).

AdjustBy is calculated as follows.

AdjustBy := Alhpa * Derivative At Neuron on Right * Neuron Output on left * Error at Neuron right + Momentum * Prior Adjust By

To explain each, and we follow with picture.

Alpha is a Learning Rate – and is usually less then 1. in range 0.1 to 0.001. Learning rate scales down the adjustment to avoid big jumps in weights and missing minima. In experiments pages more is discussed on Alpha.

Derivative at Neuron is the gradient value of neuron’s value at that value. See more later on various activation functions and their derivatives.

Neuron Output on the left – that value we already have from forward pass.

Error at Neuron on the right we have from doing error pass for entire network after forward pass.

Moment is a value between lets sat 0.1 and maybe even 2 or 5. Moment scales prior adjustment and adds to current adjustment. it acts as giving some inertia to adjustment. if prior adjustments was large will influence this adjustment. More on behavior in experiments.

Prior Adjustment is value we shall store with each weight after current is calculated so we can influence next weight adjustment. Below is pictorial example of adjusting one single weight.

Derivatives (and activation functions)

let take a look first at how we get derivative. Sparing you lengthy maths, each activation function should have a derivative, as this is the gradient along which adjustments can be made will a minima (best solution) is reached. Possible activation functions and their derivatives are shown below. (When i implement more i will add more).

Errors

Errors already discussed in code pages but here is reinforcement pictorially how to work out errors for end nodes and then for any node back into the network

and we get output actual from running a forward pass. By the way to remind, errors are calculated working backward through the network – but that you have already seen in the code pages.

Forward Pass

Forward pass is based on input values, for each layer, sum the product of prior nodes and weights connecting them. Here is pictorially how it works.

This concludes all the formulas there are to work with!

Share this: