Processing...This may take a few seconds...

Deep Learning for Momentum Analysis


Are you a technical trader? Are you tired of relying on the myriad of technical indicators and algorithms that most of the time are no better than a coin toss? We have a solution for you.


Identify patterns in Price Action

Technical analysis aims to identify patterns in price action that may indicate an upward or downward bias in short term price movement. There are a couple of reasons we believe this can be a useful addition in your arsenal for stock analysis.

  • Not everyone reacts to new information or financial results at the same time. There is a natural supply and demand balance for stocks not entirely dependent on the underlying fundamentals that contributes to short term price movement. The patterns that form as a new piece of information is being absorbed by the market may be useful in predicting short term price movements.
  • There is a group of market participants who trade solely based on technical patterns and may react the same way once certain patterns form. Being able to identify such patterns early on can allow you to take a position before the rest of the crowd reacts in a predictable way.

The goal of our Momentum model is to try to find patterns in the stock price action and volume data from the recent past that may indicate a bullish/bearish bias in short term price movement. While most technical indicators are trying to achieve the same goal in one form or another, the underlying technology to identify patterns from large data sets has come a long way in the past few years. The fundamental idea behind using machine learning algorithms for this is to let the data determine if there are any patterns in the data rather than try to guess rules based algorithms to find the patterns.


Deep Learning

The term “Deep Learning” has become a bit of a buzzword recently and gets thrown around a lot. Let's first quickly go over what this actually means. The “Deep” in deep learning just indicates the presence of multiple “hidden” layers in a Neural Network. The more hidden layers we have, the more complicated the network is and this allows the network to fit very complex functions of the input parameters. So why has this become so popular only recently?

The challenge with building a very elaborate Neural network is that it starts to get more and more computationally expensive to train the network as we grow its size. There is also a performance hit to be considered when running the model on new data (inference) but the main challenge is training. With the advent of tools to use GPUs with their high bandwidth to train Neural Networks, it has allowed us to dramatically scale up with size and complexity of the networks we can train in a reasonable amount of time. Where as a complex model trained on a large dataset may have required weeks to months of continuous running on a CPU cluster, the training time is now shrunk to a few minutes to a few hours on a state of the art GPU cluster.

However, building a very complex or “deep” network in itself is not always the solution. It is a bit of a double edged sword and we will go over why this is next.


A high variance model

A complex or “Deep” Neural Network is capable of learning very complex functions of the input data. This tends to make it a high variance model. What this basically means is that it has a tendency to overfit the training data where it works very well on data within the training set but performs poorly on out of sample data. In an extreme case, this would be similar to looking up data from a database of the sample training sets. While such a model will be very accurate on the training data, it has practically zero utility on any data not in the training set. The main challenge here is to structure the network and normalize the data in such a way so it can identify and learn any potential generic patterns in the data without merely overfitting the precise price movements and volume information in our training sets.

This concept, while fundamental to any model, may be hard to understand if you are not involved with designing models for a living. Lets try to illustrate it with a simple example. Lets say we have training data that includes the height of an individual and their sex (Male/Female). The goal is to build a model that predicts the sex of the individual given their height. Clearly this information is not sufficient to determine the answer with any degree of certainity - similar to how price action in itself cannot predict near term stock price movement. However if we were to still try to build this model, we can think of one that is based on the assumption that the average Male is typically taller than the average Female. Lets say our model determines a decision boundary of 5'6" (i.e. anyone over 5'6" is identified as a male and anyone below is identified as a female) based on the training data. While we are sure to have examples that don't satisfy this rule, if we were to run our model on out of sample data with similar demographics to our training data, there is a good likelihood that our model performs pretty well.

So what happens if we instead have a high variance model that overfits our data? Lets say we have an example in our training data of a man who is 4'9" tall and a woman who is 5'11" tall. For a high variance model that overfits the data, our decision boundary could look something like this:

  • If the height of the individual is 4'9", classify the individual as a man
  • If the height of the invividual is 5'11", classify the individual as a woman
  • If the first two cases are not met and the height of the individual is greater than 5'6", classify the individual as a man else a woman

    Clearly, while the second model may have better accuracy on our training data, it is likely to underperform the first one on out of sample data (Assuming any individual with a height of 4'9" is more likely to be a woman and so on). While it may seem silly that we would consider the second model, remember that unlike rule based algorithms, in training a DNN we do not control what the network learns - it is just trying to find the best fit for the input data. If we run a high variance model capable of fitting very complex functions on our training data enough times, it will overfit the data like our second model. This is one of the biggest challenges in building and training our Deep Neural Network to find patterns in price action.


    Our Momentum Model

    We have trained our momentum model on twenty years worth of data from 1996 to 2016 normalized to allow us to look for patterns across hundreds of stocks. We also attempt to exclude large price movements based on financial results and news events that are not driven by price action patterns.

    What our model does is to look at price action data over the past couple of weeks and see if it fits a pattern that generally in the past has been a precursor to bullish or bearish price movements over the next few days. One way to think about it is to compare it to Google’s Quick, Draw! Doodles.

    Clearly there is no one right answer in both those models and you are sure to find examples where similar patterns may lead to different answers. However given enough data, both models are able to learn some high level patterns that are typically indicative of a given answer key. At the end of the day, remember, this is just a model. Don’t use it as your only answer on whether to buy or sell a given stock but instead use it as an additional input to augment your own analysis.

    Welcome to Fundamental Speculation! We hope you find this tool useful in your investment analysis process.