Mixed Logic Gates Experiment

A real experiment…..April 2019

for this experiment data set is five gates, OR, AND, NOR, NAND and XOR. There is 3 input neurons and 1 output. 3 inputs are (1) gate type, and (2) and (3) are the two values as either 0 or 1. Gate types normalised into -1 .. +1 range. e.g. -0.8 is AND, -0.6 OR and so on. Below is the data.txt file content.

Another thing to mention, NN4 is now superseded by NN5 version of code. NN5 is driven by command line parameters instead of reading values from config file. I did this so i can generate problematically various combinations of command lines … but more about this on NN5 pages.

First step was to identify which function performed best – and outcome is Sigmoid (compared Sigmoid, Tanh and SWISH; ELU and PRELU for some other day as need to fix overflow). Below is outcome of analysis of the produced log file, where i pivot momentum, beta and learning rate.

Once i had Sigmoid as candidate, next was to confirm what i did a while ago to confirm best parameters for given network size.

CONCLUSION –
Sigmoid,
Best paramters for problem of mixed gates, 20 data element file,
Data file normalised for -1 … 1 values
Random range for generating weights 0 … +1
Learning Rate 0.9
Momentum 0.2
Beta 1
Below is command line sample

NN5 wcfg.txt fdata.txt glog.txt v4 c1000 s1000 e0.001 qyes tno l4 n3,13,11,1 hshrt asigmoid k0,1 r0.9 b1 m0.2 d0.1 olearn

Next, i generated batch file to try different network sizes and let them compete up to 1000 epochs and compared which network size learns fastest or best. below is example of the batch file. (Incidental i do not preserve batch files as i have batch maker code i can reuse as long as i log, analyse log and leave some notes).

Note for each i first call objective or RND which makes weights random. (the 0.9 being 0.89999 is a quirk of generating batch files by code, which for some reason when using double precision is not making values exact enough for us humans – i will iron out later).

Below is what log file looks like and as s (show every so many epochs) is same as cycles / epoch, there are no intermediate steps shown, just final at 1000 epochs.

Now i analysed the log file and find what network sizes perform best for problem at hand – and i had a surprise, seems specific net size of four layers, 3,13,11,1 managed to learn to error of 0.001 (that is less then one percent!!) in less then 1000 cycles. I could not compare run times of small net vs this etc etc… but as WRITELN in pascal is so expensive this is probably not right kind of experiment for that measurement.

Anyhow below is xls of pivoted logs comparing layer sizes and network configurations and the average error achieved over 10 samples for each.

and next is picture is filtering for all those that learned in less then 1000 epochs, and notes network 3,13,11,1 showing four times out of 10 samples – that is pretty significant.

as side note, i then run this network to error level of 0.0001 which is one hundred of a percent accuracy and network learned in about 250K epochs and below is data file once we run the network against it – pretty well shaped weights to give this outcome.

Small note on errors as i had to refactor the code and do some debugging. I noticed that error “spread” of actuals to target can be wide even though average (absolute) is low. Meanings in some cases net will overall have high fit but treat some values as “outliers” and fit the net / weights around many very well and poorly against a few. So i put in min and max deviations in logs and in data file simply just center average – e.g. ravage were sign is not corrected and would show me bias away from zero (center!). Otherwise what i measure against is Absolute Average.

Below is sample weights after RND. For explanation, each line is from one left neuron to all right neurons ( i changed this as noted pascal has max length of text file line on disk). So there are 4 lines to first 13 neurons in hidden layer (4th line is from bias). then 14 lines to next 11 neurons.

cfg.txt

In Conclusion (again) for experiment 5 sets of logic gates, normalised inputs -1..+1, 20 data lines, Sigmoid as activation in 4 layer net of 3,13,11,1 neurons with bias set to 1,1,1 and momentum 0.2, learning rate 0.9 and beta at 1 – i repeated again re-randomising and learning to 0.0001 – see below and again snap shot of learning, then run command and data file and finally learned weights if anyone wants to cross check me :-).