Backpropagation using an alternative Activation Function

30 Apr 2016

The backpropagation algorithm has two parts, the Forward-Propagation and the Backpropagation. Forward-Propagation is the first half of Backpropagation that is used to find the loss of the neural network. The loss could be called an error because it shows the magnitude of error that our neural network outputs.

This error is calculated by multiplying our weights with our input and passing the product of these vectors to our activation function. The activation function is just supposed to take in the array of numbers and squish those number between the ranges of those activation function. This number is then stored in a hidden neural layer. It is used to calculate the loss (error) by subtracting the output from the values of the hidden neural layer (The number that our activation returned)

The second half of this algorithm is Backpropagation. Backpropagation is moving backwards on the network and (as far as I know right now), we multiply the hidden neural layer by the gradient of the state of what our weights are in. Once this is done, we change the weights to the dot product of our input array and the gradient of what we calculated during our Forward propagation (the second layer of the network, on a two layer network). Backpropagation is a heavy subject, so I’ll be spending more time studying it.

The tutorial that I've been using to study neural networks uses the logistic function to predict the output of a truth table using three inputs. The table below shows the three inputs x,y, z and an output. As you can see, x is exactly the same as the output. These are the kinds of patterns that neural networks are supposed to recognize. This is also the same method that humans use to learn, by recognizing patterns. The difficulty is just how complex those patterns are.

The example that I was using, which is cited below; uses a logistic function to predict the output. I was wondering if I could get the same behaviour from a simple neural network using a different Activation Function. The Activation Function that I used is called the Tanh function. The Tanh function is explained well here. If you look at this function's graph, it looks similar to the logistic function except instead of having a range from 0 to 1, it has a range from -1 to 1. I don't know enough about neural networks to tell when you want to use a certain function over another, but I will find out soon enough!

x y z output
-1 -1 1 -1
-1 1 1 -1
1 -1 1 1
1 1 1 1

These were the results I got after training the neural network 100,000 times.

Output After Training Expected
-0.99999703 -1
-0.99999702 -1
0.99999702 1
0.99999703 1

As you can see, the results are not completely perfect, but the network and the inputs were easy enough for them to be close, this shows that learning is never perfect and there is always some error that we have to account for in all our learning algorithms.

Discuss on Github

Reference:

Trask, Andrew. "A Neural Network in 11 Lines of Python (Part 1)." Iamtrask. N.p., 12 July 2015. Web. 30 Apr. 2016.