Deep Learning has been around for quite a while. I can remember my colleagues at a big Swiss bank some while ago implementing neural networks in the 90s, yes I know this looks like centuries ago, but at that time people were still using Next computers, small and nice black boxes that were the must for anybody in the programming field. They were trying to predict foreign exchange rates already, so pretty much what we are trying to do today. We were also talking expert systems and Lisp, but that is a different story.

In recent years, thanks to the increase in CPU power, the availability of GPU that allow for parallel computing and cheaper memory prices, without forgetting the flexibility and scalability provided by the cloud, neural networks have become mainstream.

Neural networks try to replicate the human brain, ingesting multiple information and predicting a result, be it recognizing an image, understanding language or simply interpreting a text sentiment or forecasting sales figures. Neural networks are made of multiple layers of cells that get activated or not based on previous training cycles. They can have multiple layers depending on the complexity of the problem.

There are multiple variants of neural networks, artificial neural networks, convolutional neural networks, recurrent neural networks. These are the main types of neural networks which allow deep learning. Other types are emerging as well, but let us try to focus on these 3 for now.

Artificial Neural Networks are the simplest form of neural networks, they can be used for classification, predictions for example. Some are trying to use them to predict stock prices, the old dream for every data scientist and get rich. The big advantage against other machine learning models is that they do not need a lot of what is called feature engineering as they will detect best features by themselves during training time. Feature engineering is about finding the best predictors in a data set and eliminate the features that do not have any impact on the result.

Convolutional neural networks are extremely well suited for image recognition, each layer in the network will detect different features of the image using filters of different shapes ( convolutions and pooling). They can also be used for Natural Language Processing detecting patterns in the language and predicting best answers or next words. Everybody has in mind the famous cat and dogs problem that segregate between cats and dogs images.

Recurrent neural networks are an advanced type of ANN and have the ability to remember the context of an input, for example in natural language processing it is extremely important to understand the context of a word to be able to predict the next word. It can also be used to predict time series and they allow to keep the history of the time series and propagate further down the layers. There are different types of recurrent neural networks and some can be quite complicated and need quite a lot of fine tuning and architecture work.

There are some drawbacks though implementing neural networks. First it needs a lot of data to train on, this is not always possible and more standard algorithms might do a better job. Second, neural networks have a tendency to overfit, that is the trained model is perfoming well on training data but when faced with real life data, it performs poorly and the accuracy drops. There are few techniques to overcome this, but that requires a lot of retraining and a lot of CPU power. Some techniques can be automatically applied and find the best tuning parameters for the network. One good example for overfitting can be found here while predicting stock prices (

Nevertheless, Deep Learning is getting very popular and libraries like TensorFlow allow for anybody to do his own experiments on a decent laptop using R or Python with amazing results. Cloud services also provide deep learning services like Amazon, Google or Microsoft. Anybody can start playing around with neural networks as this is not overly complicated and does not require a lot of statistics or calculus background, this must be one of the reasons for the big hype around them.

The number of use cases is without limit, except the amount of data available. The hype will once come down, but this is the area you need to understand to be up to date and stay on top of the technology trend.

If you want to learn more, I can only recommend the excellent course on deep learning by Andrew Ng on Coursera:

(Visited 22 times, 1 visits today)
%d bloggers like this: