AlphaTrader
AlphaTrader copied to clipboard
An implementation of a stock market trading bot, which uses Deep Q Learning
Project Proposal
Scientific Papers
For my project in Applied Deep Learning I chose to focus on Deep Reinforcement Learning (DRL) in the financial market or rather on the stock market. The idea behind this proposal is to create a Deep Q Network (DQN) which can trade financial products from tech-companies, such as Google or Apple. This topic seems to attract a great deal of attention, since there are dozens of scientific papers on sites like e.g. arXiv.org covering this problem. Therefore, there are many directions in which this project might develop, but for the beginning I will use a simple DQN in combination with the following four papers:
These papers were mainly used to get an idea on how to preprocess financial data, design training- and testing datasets and define a benchmark to evaluate the performance of the implemented agent.
Other approaches, which were not used for now, but could be of future interest are the usage of Convolutional Neural Networks (CNN) or Recurrent Neural Networks (RNN), with a focus on models with a Long Short Term Memory (LSTM).
CNN's
Predict Forex Trend via Convolutional Neural Networks, Conditional time series forecasting with convolutional neural networks, Using Deep Learning Neural Networks and Candlestick Chart Representation to Predict Stock Market
RNN's
Stock Prices Prediction using Deep Learning Models, Global Stock Market Prediction Based on Stock Chart Images Using Deep Q-Network, Financial series prediction using Attention LSTM
Another idea for the future is the inclusion of sentiment analysis in the model. Papers available on this topic are:
-
Forex trading and Twitter: Spam, bots, and reputation manipulation
=> Research on the influence of Tweets on the market and whether to buy, hold or sell. -
Listening to Chaotic Whispers: A Deep Learning Framework for News-oriented Stock Trend Prediction
=> Mechanism to process recent news related to the stock market.
Another approach provides this paper, which tries to simulate the "whole stock market" in a multi agent system (MAS), where each agent learns individually and trades on its own. The collective behaviour of agents is then used to predict the market. This method is out of the projects scope at the moment due to missing processing power and time, but might be of interest in future work.
Topic
As already mentioned, this project will have a focus on Reinforcement Learning (RL), especially in the context of stock trading and the prediction of this market using a DQN.
Project Type
Concerning the project type, there are many options applicable. Types like Bring your own data, Bring your own method and Beat the stars can all be applied, since the project can evolve in many directions in the future. For example Bring your own data may be needed if future work focuses on the inclusion of sentiment analysis in the prediction. However if the project goes beyond the scope of this lecture, focus will be lied solely on DRL with a DQN agent, which will at least result in Bring your own method.
Summary
-
Description and Approach
The goal of the project is to predict different stocks from different companies, such as Google or Apple.
I will begin with standard DRL approaches listed on SpinningUp and their Baseline Implementation to get an overview and a general practical understanding of this field as well as an insight in Keras or PyTorch. Then I will try to use different approaches from the earlier mentioned papers to predict the market with DRL.
After a first working model has been implemented, it will be used as a baseline for further hyper parameter tuning and model variations.
For general comparison I will use a third party extension of the OpenAI Gym Toolkit called AnyTrading, which is a testing and training environment to compare trading approaches.
-
Dataset
The datasets for training and testing will be acquired from Yahoo! Finance, focusing on tech companies like Google or Apple. However, any other stock data would work as well. For the pre-processing of this data, I will start evaluating the agent on non pre-processed data, followed by different scaling methods, such as Sigmoid, MinMax or Standard.
-
Work-Breakdown Structure
Individual Task | Time estimate | Time used |
---|---|---|
research topic and first draft | 5h | 13h |
setting up the environment and acquiring the datasets | 3h | 7h |
designing and building an appropriate network | 15h | 22h |
fine-tuning and varying that network | 15h | 15h |
building an application to present the results | 6h | 13h |
writing the final report | 5h | 5h |
preparing the presentation of the project | 3h | 2h |
Implementation
Error Metric
-
Error Metric
Every agent's structure, hyper parameters as well as the choice of scaling techniques, will be trained for 650 epochs on the trainings dataset (AAPL_train.csv). Therefore, different approaches can be evaluated and compared using the average profit as well as the average reward of the last 50 epochs (600-650).
Reward is defined by the capability to correctly predict the direction of the stock price of the following day. For example, if the price falls and the agent bet on falling prices (SHORT), it will receive a positive reward or if the price falls and the agent bet on rising prices (LONG), it will receive a negative reward, consisting of the price difference.
Profit is defined by the price difference between two time steps, where the agent changed its opinion about the trend, switching from LONG to SHORT or the other way around. This definition implies a trade, where the agent e.g. sells all its LONG-positions and buys as much SHORT-positions as possible, to not lose any money.
This metric is used to verify that the agent is actually making progress. Since this verification is only used on the trainings dataset, it does not give an estimation on the real-life performance on unseen data. Thus, a test suite was implemented to compare models on unseen data and compare them by earned profit and reward on a given test set (AAPL_test.csv). -
Error Metric Target
First benchmarks of the implemented agent were quite misleading, resulting in an average profit of 0.477 and an average reward of 3.568. Thus, I set my target to reach at least an average profit of 1, which would mean that the agent is at least profitable on the trainings set. After many iterations of adjusting hyper parameters and changing the model and still resulting in really bad and random performance, I took a closer look on the implementation of the used environment, called AnyTrading. After a short observation, I felt completely unsatisfied with the implementation and therefore defined my own calculations of reward and profit. This change finally gave me the impression that my agent is making progress and actually learning. Thus, earlier saved models and plots are not comparable to newer ones. After the change the target goal of 1 was quite simple to archive and is therefore not really representative. -
Error Metric Achievement
The following table displays the performance results of the last 7 agent variations, which all performed better than the target of 1.
Average Profit | Average Reward |
---|---|
19.794 | 984.336 |
2.763 | 507.834 |
6.313 | 207.225 |
22.684 | 992.019 |
8.445 | 730.180 |
15.148 | 474.520 |
5.843 | 349.651 |
The following plot shows the average profit by episode
and the average reward of the best model.
Since the evaluation of the agent on the trainings set is not that interesting and is only used to verify that the agent is actually learning something, I will provide some plots, which show the performance of the model on unseen data.
Green dots are time steps, where the agent decided to go LONG
Red dots are time steps, where the agent decided to go SHORT
Plot of a model trained on AAPL, tested on GOOG
Plot of a model trained on GOOG, tested on GOOG
Plot of a model trained on GOOG, tested on AAPL
Changelog
Original Hyper Parameter
- Training per episode: 1
- Size of replay memory: 20.000
- Size of minibatch: 32
- Discount rate gamma: 0.95
- Exploration rate epsilon: 1.0
- Exploration rate epsilon min: 0.001
- Exploration rate decay: 0.995
- Learning rate: 0.001
Original Model
model = Sequential()
model.add(Dense(64, input_dim=self.state_size, activation='relu'))
model.add(Dense(32, input_dim=self.state_size, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(self.action_size, activation='linear'))
model.compile(loss='mse',
optimizer=Adam(lr=self.learning_rate))
Changes
-
Varying optimizer
-
Changing size of minibatch to 64
-
Varying scaling methods from 0 to 1
-
Change reward and profit calculation
-
Early stop if profit < 0.5
-
Early stop if profit < 0.8
-
Varying epsilon and size of minibatch
-
Training model 4 times per episode
-
Adapting hyper parameter and model structure
Adapted Hyper Parameters- Training per episode: 4
- Size of replay memory: 20.000
- Size of minibatch: 32
- Discount rate gamma: 0.95
- Exploration rate epsilon: 1.0
- Exploration rate epsilon min: 0.01
- Exploration rate decay: 0.995
- Learning rate: 0.0005
Adapted Model
model = Sequential() model.add(Dense(64, input_dim=self.state_size, activation='relu')) model.add(Dense(32, input_dim=self.state_size, activation='relu')) model.add(Dense(8, activation='relu')) model.add(Dense(self.action_size, activation='softmax')) model.compile(loss='mse', optimizer=Adam(learning_rate=self.learning_rate))
-
Varying amount of training of model per episode
-
Varying dropout
-
Changing size of minibatch to size of replay memory, training with 10% chance
-
Varying scaling methods from 0.1 to 1
-
Varying layers and activation functions of model
Setup Guide
To try own datasets download a training and test split from Yahoo! Finance, preferably overlapping 30 days, into data/
To install the needed dependencies run pip install requirements.txt
Afterwards you can train your own model by specifying the mode and the trainings data
python main.py -m train -d AAPL_train.csv
Or you can use existing models for evaluation by specifying the mode, the testing data and the model
python main.py -m test -d AAPL_test.csv -n model_18_17_06
Especially model_18_17_07 and model_18_21_52 perform quite well.
If you only want to run the backend for web-application execute server.py
python server.py
Afterwards the backend should be accessible on http://localhost:5000