Last week; I learned a huge lesson, machine learning is costly. I paid around 1500 dollars for my new PC build. There were 2 reasons I got this PC. The first reason is because my PCs are very old. I have 3 PCs that are from the early 2000s. One of them does not even turn on anymore and the other two are so old that the only Linux they can run is a lightweight distribution. I clearly needed a new computer. I’ll still be using my old ones, for the nostalgia. Keep in mind that the 1500-dollar price tag was only for the build. I still have my old monitor, keyboard and mouse that are over 10 years old. They work perfectly fine so I don’t see the need to waste anymore money. The PC has a 240Gb solid state drive and an msi 970 gaming motherboard.
The second reason that I got this PC was for machine learning, I haven’t run any machine learning algorithms on it yet, but I will pretty soon. The reason I need a 1500 dollars to build this was to support the GPU (Graphics Processor Unit). GPU’s were originally invented for gaming. They have thousands of cores that compute pixel values in parallel because gaming applications are graphics intensive. Since GPU’s do calculations in parallel, they are great for number crunching. The difference between a CPU and a GPU is that a CPU has a small number of cores (from 2-8). A GPU has thousands of cores, so a GPU is great for parallel computations.
There are two main driving forces of the recent artificial intelligence boom; the progress of hardware, and the amount of data that we have access to. Reinforcement Learning and Artificial Neural Networks are relatively old ideas that were invented and implemented in the mid to late 20th century. In fact, Reinforcement Learning was used to beat human players in Backgammon in the 1990’s. The reasons these ideas didn’t flourish is because of Moore’s law; and because we did not have enough data (You need a lot of data to train neural networks). With a lot of data, you also need sufficient hardware to do the matrix calculations for these neural networks in an efficient way. With the rise of the internet, a lot of data can be collected to train neural networks. In 1990, if you wanted millions of images to train neural network to recognize pictures, it would be very expensive to get millions of images, now you can just write a program that crawls Google Images and can get millions of images in a couple of days.
The progress of hardware is mainly driven by the GPU. The GPU is a mini-supercomputer. It can do millions of calculations on its thousands of cores in parallel. A neural network that can take 2 days to train on a CPU can be trained on a GPU in 15 minutes. This means that if there is something wrong in the algorithm that you are running, you don’t have to wait a couple days to find out. With large enough data sets and large enough neural networks, sometimes training can take a couple of months on a CPU. In a GPU it only takes 1 or two days.
The GPU that I have is NVIDIA GeForce 970. It’s big, and it’s powerful. I was already tested it on Windows, but I have to test on Linux later. Testing it on Windows was not difficult, NVIDIA has platform for programming GPUS’s called CUDA here is the website. They also make running your programs easy by integrating CUDA with Visual Studio. You can program in CUDA if you know C and C++, but you can also use python wrappers. I was able to write some simple programs like multiplying 2 matrices. The cool part was that I was running this program on my GPU! Not a CPU. I still don’t know anything about GPUs except the details of how different they are from CPU; but I know that the more I study CUDA architecture and learn how to implement my own machine learning algorithms, the more I will know about GPUs.
Discuss on Github