Convolution neural networks are mainly used for computer vision. These Convolution Neural Networks, or Convnets, makes it so you don’t have to use a lot of parameters to train your network. These networks can be thought of as volumes because images have a width, height and a number of channels which are RGB (red, blue, green) values. When you multiply the height width and the color channel, you get a volume.
Professor Winston makes the general idea of convnets easy to understand, He says that to be able to train a neural network on an image using a convnet, you run a neuron on a small part of the image. You then run this neuron on another small part of the image and you keep doing this until you have run the neuron on every pixel in the image. The output of this neuron is composed of a specific place in the image. This is the convolution of the image. The convolution of this image is composed of a couple of points. You take the local maximum of these values and construct another image with these values. You then slide over the neuron of the image that you just created and do the same process that you did for the original image and get other points/values. This process is called max pooling because you are taking the maximum of the local points in the image. Min pooling is doing the opposite, where you take the local minimum of the values in the image.
The tool that you use to scan a small part of an image is called a kernel. This kernel will detect the features in the network. The great part about this process is you can collect as many kernels as you want, you can collect 50 kernels, 100 kernels and more. You then put these kernels through your trained neural network and your neural network decides on whether the image is a dog, car, table, etc.
The most significant lesson that I learned this week is that neural networks are just elaborate function approximation tools. That is the only reason they exist, to approximate functions. When you are performing supervised learning, you have a function that you know, but your artificial neural network doesn’t. You then tweak the weights and biases of this network until your network’s function is exactly the same one that you have. Michael Nielsen give some great examples of how neural networks compute any function. You can find that material here. Since artificial neural networks are used to approximate functions with as little error as possible, they are great for reinforcement learning. The Backpropagation Algorithm, specifically The Gradient decent is used to find the minimum of the function so that the error is as small as possible.
To make these ideas concrete, suppose that you wanted to train a robot how to shake a hand, what you do is give it a lot of data of people shaking hands with a dummy robot. This data would not specifically be videos, but only the data of the Markov states, the value functions and the amount of reward that the dummy robot is getting. The robot would then; through trial- and error, use this data to tweak its neural networks so that it could shake someone’s hand. The best part would be that you wouldn’t have to program the robot to shake someone’s hand! It would learn on its own!
Discuss on Github
Reference:
Winston, Patrick H. "Lecture 12B: Deep Neural Nets." MIT OpenCourseWare. MIT OpenCourseWare, 2015. Web. 30 May 2016.