Forward Propagation
Basics of forward propagation is sum up the products of upstream neurons and weights connecting them. Activate the value into the neuron with activatin funciton to producet neuron output. And repeat until outputs of the network are completed. Using picture below Value In at N1,0 is N0,0 * W0,0,0 + N0,1 * W0,1,0 + N0,2 * W0,2,0

Value out of N1,0 would be for example if Sigmoid then = 1 / (1 + Exp (N1,0.ValueIn))
and repeat this set of formulas to the end. In Pascal would look like this:
Forward Pass Procedure
Var
layers, right,left: word;
….
for layers := 1 to NumberOfLayers-1 Do {start at 1 and look back}
for right:=0 to NeuronPerLayer[layer] Do
Begin
Neuron[layer,right].valueIn:=0; {initialize Sum}
for left:=0 to NeuronsPerLayer[layer-1] Do {do not do -1 as we want to catch Bias}
Neuron[layer,right].valueIn := Neuron[layer,right].valueIn + Neuron[layer-1,left] * Weight[layer-1,left,right];
Neuron[layer,right].valueout:=Activation(Neuron[layer,right].valueIn, Neuron[layer,right].ActivationFunction);
End;
At the end of the loop above the output values of last neurons have completed forward pass. Few words for clarity. Above needs to be repeated for each data line available. For example data file may have 3 data lines, each of set of inputs and desired outputs. Forward pass must be repeated for each line and output stored along with desired so error can be calculated. In pascal i provide data lines memory model as shown in structure pages. Suffice to show her how for each forward loop we pre populate neuron in and after forward pass copy out from output neurons data out values.
Entire Process of forward pass for each data line
Var
DataLineCount : Word;
Counter : Word;
……
for datalineCount:=0 to DataLinesCount-1 Do
Begin
{first populate input neurons from dataline }
for counter:=0 to NeuronsperLayer[0]-1 Do Neuron[0,counter]:=DataLine[DataLineCount].DataIn[counter];
{Execute the forward pass core function shown above}
{now get out of last neuron set for dataline out}
for counter := 0 to NeuronsPerLayer[NumberOfLayers-1] Do DataLine[DataLineCounter].DataOut[counter]:=Neuron[NeuronsPerLayer(NumebrofLayers-1),counter];
end; {of the for loop to make pass through entire data set}
in NN4, the RUN command will do exactly above and no more – apart from writing out the data file. And when writting out, i actual write out both the original out value, also called TARGET or DESIRED, and the actual calculated by the forward pass, called ACTUAL. the difference between these to values is the “ERROR” but more about this below.
Error Calculation
To be able to back propagate we need error at every node. Please see resources on Internet about how it comes to be what i say below. As said am distilling to bare minimum to be able to write the code. For now trust me we need Error at every node.
To begin we start with Error at the last, output layer. Formula i use in NN4 is:
ERROR := ACTUAL – TARGET
Order is important, as if reversed then later on formulas would need a sign adjustment. for now just assume this is how it is.
In pascal code is shown below, there are two steps, first assign errors to node at output later, then loop through layers, backwards, propagating error and scaling by the influence weights would have had.
Procedure to propagate Error into Network backwards, for one data line instance
Var
Counter : Word;
layer, left, right : Word;
…..
For Counter:=0 to NeuronsPerLayer[NumberOfLayers-1]
Do With Neuron[NumberOfLayers-1,counter], DataLine[datalineCount] Do Error:=ValueOut – DataOut[Counter];
for those not familiar with pascal, the with statement allows to shorten in next block structured variables, for example ValueOut is part of Neuron, but i no longer need to tell compiler full name of variable as it assumes due to with statement value out belongs to Neuron.
It is important not to forget errors at every node is only valid for this specific data line instance. In nutshell, above piece of code is continued with below to push back errors to all nodes before back propagation can run to adjust weights.
for layer:= numberoflayers-2 downto 1 Do
{starting at -2 as we did last layer above, and to 1 not 0 as we do not need to work on input neuron layer}
Begin
for left:=0 to neuronsperlayer[layer+1]-2 Do
Begin
Neuron[layer,left].Error:=0; {zero out and prepare to sum up with weights influences}
for right:=0 to NeuronsPerLayer[layer+1]-2 Do With Neuron[Layer,left] Do
Error := Error + Neuron[layer+1,right]error * Weight[Layer,left,right].weight;
End; {of processing this specific layer, all errors for neurons on the left}
End; {of processing all layers}
to make sure above is clear – see picture on pages about maths and formulas. It shows now error on prior node calculated from weights and errors at nodes downstream.
And again to stress the prior code snipet is same procedure as the code snipped one above.
Back Propagation
as probably well understood by now, we back propagation we shall cycle through layers in reverse, visit each weight and adjust it given the forward pass values and errors pass just completed. gentle reminder, back propagation will be called for each data line. so weights are slowly adjusted for each data line, then entire data set is visited again and again. all the time weights are slowly being adjusted to suite all data lines hopefully equally well and thus reaching a solution where weight produce actual very close to targets. Here it is in pascal:
Var
Layer, Left, Right : Word;
AdjustBy : Single;
…….
for layer:=NumberofLayers-1 Downto 1 Do {because we shall work looking to the left}
for right :=0 to neuronsperlayer[layer]-2 Do {avoid reaching bias node as there are no weights there}
for left := 0 to neuronsperlayer[layer-1]-1 Do
Begin
AdjustBy := LearningRate * ActivationDerivative (Neuron[LayerIndex,right].ValueOut) * Neuron[Layer-1,left].ValueOut;
AdjustBy:= AdjustBy + Momentum * Weight[layer-1,left,right].PriorAdjust;
Weight[layer-1,left,right].weight:= Weight[layer-1,left,right] .eight – AdjustBy;
Weight[layer-1,left,right]. PriorAdjust:=AdjustBy;
End;
Activation Function
Below is actual function from pascal code for activation function.
Function Activation(var X:Single; ActivationFunction:ActivationFunctionType; B:Single):Single;
( X is input value for function and B is Sigmoid Beta}
Begin
case ActivationFunction of
Sigmoid: Activation:=1/(1+exp(-B * X));
Tanh: Activation:=(exp(B * x)-exp(- B * x))/(exp(B * x)+exp(-B * x));
SoftPlus: Activation:=Ln(1+exp(x)); {B must be 1 else derivative will fail }
Swish: Activation:=x * Activation(x, Sigmoid, B ); {Make recursive call }
end;
end;
Derivative Function
Function ActivationDerivative(var y,X,z:single; var activationfunction:ActivationFunctionType; var B:Single):Single;
{ y is original value in, X is output after activation function, Z is sigmoid function of Y, B is beta}
{ reason for apparently complex derivative is as seen from functions table we can take some advantages}
Begin
case ActivationFunction of
Sigmoid: ActivationDerivative:=X * (1-x) * B;
Tanh: ActivationDerivative:=1-(X * X) * B;
SoftPlus: ActivationDerivative:=z;
Swish: ActivationDerivative:= ((Y * B * z * (1-z)) + z) * B;
end;
End;