gym-anytrading
gym-anytrading copied to clipboard
DayTrade
Hi .
Hello @AminHP, how are you? I've been studying for that time and searching more and more about this world of trade and more and more, going to daytrade (intraday). In this case, for this project, would it be possible for us to use it for training between a specific time during the day? I have the data, but I'm not sure how to use it to start these trainings and see how it could be useful for our world! Do you have any suggestions on how to do it, to use this data during a period of the day or even an example, like a light for my head, on how to use this project and learn more? Thank you, sir!
Hi, @dlemosmartins. Thanks, I hope you are doing well!
Yes, it is possible. You just need to put your daily data into a data frame and pass it to the env using the code below:
x = 10
custom_env = gym.make('forex-v0',
df = my_daily_df,
window_size = x,
frame_bound = (x, len(my_daily_df)),
unit_side = 'right'
)
Afternoon, sir! So, I understood what happened, and adjusted it for the part that refers to Stocks.
x = 10 custom_env = gym.make ('stocks-v0', frame_bound = (x, len (my_daily_df)), window_size = x), and I was able to execute it.
Here comes 2 problems:
the first being mine, which is disk space for me to be able to put to process approximately 150 million lines (from csv to dataframe), with about 300 inputs, in addition to the closing price. I have no memory on the PC! lol
Now, for the second, being for stocks, and putting it as the idea of starting at a time, for example: at 9:30 am and ending at 17:30 in the afternoon ... every single day.
I, in this case of the project, would have to create a layer beyond, to validate the days - beginning and end - and then yes, put the RESET (self) method of the file 'trading_env.py'?
You can override some methods to reach your goal. But, it takes much time and can be hard. I was thinking about breaking the whole period into daily periods and considering each period as an episode.
I mean you don't need to load all 150 million lines. Just one day's data is enough. Then, you can select a period, for example [9:30, 17:30], and put it in the env
. Now, train your model with this env and consider it as an episode. Repeat this operation for each day and use all days' data to improve your model.
It seems like both problems are solved!
I think so, it goes in the vein of my doubt!
Thank you very much for your help and your attention, @AminHP !
I think we'll talk soon!
Hello, my dear! Look at me here again. =)
Continuing with the conversation, I managed to get my import code, changed it to be present in the data extraction part and then yes, I did my CSV. Explaining a little more about this, we have a first point where the hour is whole and goes from 9 to 17, so that I could thus use the form that I had commented on. Analyzing this, I got this return:
"info: {'total_reward': 1590.0, 'total_profit': 1.0134883102463903, 'position': 0} "
Then, analyzing the points in the graph, I saw that in some moments it opens more than one sale or more than one purchase and in this case, for us who try to see the intraday part, it would be more interesting to enter only once, until the exit , from the first entry, for example.
Made a purchase, just sells. Made a sale, just buy. That, would be configured in the STEP method, right? And if so, is there any way that you have already glimpsed this point of being just 1 input operation and 1 output operation, for this code?
See you soon!
Hi, again. I hope you are doing fine!
About the second picture, the actual trade only happens when the position changes. So, having like 100 buy actions in a row doesn't make 100 trades.
About your final question, you can change the reward function in a way that gives a penalty for short-time trades and make the agent trade once in a while.
I was thinking about breaking the whole period into daily periods and considering each period as an episode.
Hello @AminHP, I think this a great idea for day / inter-day trade, I try do this, but the result did not convergence since many loop. I was not good for coding. Did I do something wrong?
env_maker = lambda: gym.make('forex-v0',df=FOREX_EURUSD_1H_ASK, frame_bound=(1000, 1048), window_size=8)
env = DummyVecEnv([env_maker])
model = PPO2("MlpPolicy", env, verbose=0)
model.learn(total_timesteps=2000)
env = env_maker()
for episode in range(5000):
observation = env.reset()
while True:
observation = observation[np.newaxis, ...]
action, _states = model.predict(observation)
observation, reward, done, info = env.step(action)
# env.render()
if done:
print(episode, "_info:", info)
break
Sorry the layout, can't post long code in there.
Hi @0trade.
It seems like you are training your model with 48 hours of data. Then, test it 5000 times on the same env. I think this is wrong. You should try to train your agent for 5000 different episodes, then test it on another env.
Yes, you'r right. Repeat training on same time period that will be overfit, The model can not generalized.
But in first I want see agent make profit stability ( even overfit), and then I will continue work on different days env. Unfortunately, I try increase by time period or cycle number, no matter what the result is still not be convergence.
@0trade , @AminHP
And for a DQN agent, because I'm having a hard time and I'm not going. Do you have any ideas or implementation that I can follow? Even for the part of the start and end time, from 9 to 17, I still haven't managed. I'm lost
i have this code: ` import gym import gym_anytrading from gym_anytrading.envs import TradingEnv, StocksEnv, Actions, Positions from gym_anytrading.datasets import STOCKS_GOOGL import matplotlib.pyplot as plt import numpy as np from collections import deque from keras.models import Sequential from keras.layers import Dense from keras.optimizers import Adam from keras import backend as K import random import tensorflow as tf
class DQNAgent: def init(self, state_size, action_size,shape): self.state_size = state_size self.action_size = action_size self.memory = deque(maxlen=2000) self.gamma = 0.95 # discount rate self.epsilon = 1.0 # exploration rate self.epsilon_min = 0.01 self.epsilon_decay = 0.99 self.learning_rate = 0.001 self._shape = shape self.model = self._build_model() self.target_model = self._build_model() self.update_target_model()
def _huber_loss(self, y_true, y_pred, clip_delta=1.0):
error = y_true - y_pred
cond = K.abs(error) <= clip_delta
squared_loss = 0.5 * K.square(error)
quadratic_loss = 0.5 * \
K.square(clip_delta) + clip_delta * (K.abs(error) - clip_delta)
return K.mean(tf.where(cond, squared_loss, quadratic_loss))
def _build_model(self):
model = Sequential()
model.add(Dense(24, input_shape=(1,self.state_size), activation='relu'))
model.add(Dense(24, activation='relu'))
model.add(Dense(self.action_size, activation='linear'))
model.compile(loss=self._huber_loss,
optimizer=Adam(lr=self.learning_rate))
print(model.summary())
return model
def update_target_model(self):
self.target_model.set_weights(self.model.get_weights())
def memorize(self, state, action, reward, next_state, done):
self.memory.append((state, action, reward, next_state, done))
def act(self, state):
if np.random.rand() <= self.epsilon:
return random.randrange(self.action_size)
act_values = self.model.predict(state)
return np.argmax(act_values[0])
def replay(self, batch_size):
minibatch = random.sample(self.memory, batch_size)
for state, action, reward, next_state, done in minibatch:
state = np.reshape(state, [1, state_size])
target = self.model.predict(state)
if done:
target[0][action] = reward
else:
next_state = np.reshape(next_state, [1, state_size])
t = self.target_model.predict(next_state)[0]
target[0][action] = reward + self.gamma * np.amax(t)
self.model.fit(state, target, epochs=1, verbose=0)
if self.epsilon > self.epsilon_min:
self.epsilon *= self.epsilon_decay
def load(self, name):
self.model.load_weights(name)
def save(self, name):
self.model.save_weights(name)
env = gym.make('stocks-v0', frame_bound=(9, len(STOCKS_GOOGL)), window_size=1) EPISODES = 4000
state_size = env.observation_space.shape[1] action_size = env.action_space.n agent = DQNAgent(state_size, action_size,env.shape)
done = False batch_size = 32
for e in range(EPISODES): state = env.reset() state= np.reshape(state, [env.window_size, env.shape[1]]) for time in range(500):
#action = agent.act(state)
action = env.action_space.sample()
next_state, reward, done, _ = env.step(action)
reward = reward if not done else -10
next_state = np.reshape(next_state, [env.window_size, env.shape[1]])
agent.memorize(state, action, reward, next_state, done)
state = next_state
if done:
print("info:", _)
break
if len(agent.memory) > batch_size:
agent.replay(batch_size)
env.render()
`
and when I run this code, I get this feedback ...
Model was constructed with shape (None, 1, 173) for input Tensor("dense_input:0", shape=(None, 1, 173), dtype=float32), but it was called on an input with incompatible shape (None, 173). Model: "sequential"
`Layer (type) Output Shape Param
dense (Dense) (None, 1, 24) 4176
dense_1 (Dense) (None, 1, 24) 600
dense_2 (Dense) (None, 1, 2) 50
Total params: 4,826 Trainable params: 4,826 Non-trainable params: 0`
`Model: "sequential_1" Layer (type) Output Shape Param
dense_3 (Dense) (None, 1, 24) 4176
dense_4 (Dense) (None, 1, 24) 600
dense_5 (Dense) (None, 1, 2) 50
Total params: 4,826 Trainable params: 4,826 Non-trainable params: 0`
info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} W0803 23:59:08.353774 8688 functional.py:587] Model was constructed with shape (None, 1, 173) for input Tensor("dense_3_input:0", shape=(None, 1, 173), dtype=float32), but it was called on an input with incompatible shape (None, 173). W0803 23:59:08.435745 8688 functional.py:587] Model was constructed with shape (None, 1, 173) for input Tensor("dense_input:0", shape=(None, 1, 173), dtype=float32), but it was called on an input with incompatible shape (None, 173). W0803 23:59:08.688743 8688 functional.py:587] Model was constructed with shape (None, 1, 173) for input Tensor("dense_input:0", shape=(None, 1, 173), dtype=float32), but it was called on an input with incompatible shape (None, 173). info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 0.9844628868832217, 'position': 1} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 0.9846724897802759, 'position': 1} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0}
Hi, @dlemosmartins
Before going to write everything from scratch, I suggest using some RL libraries like stable_baselines
. I have already put an example here in the new release of gym-anytrading
that might be useful.
@0trade you can also use this example in order to train and test on the same env to overfit.
@AminHP Hello Mr. all right? So ... I tried to install it on my notebook here but it didn’t happen, but everything is fine. I managed to do with DQN his part of making trade and everything, with some rules of Buy, Sell and Holding, but my problem now is trying to generate it in LSTM to use my video card. It is only using 10% capacity, with dense layers only. With lstm I'm still trying to understand why I can't reshape the correct entry. Check the log:
For my model, I left it like this:
For the replay layer, I left it like this:
it only goes fast, with only 1 epoch, but the result was tense ...
but now, my challenge is to try to do with lstm, or cudnnlstm - in this case to use the GPU.
For the LSTM network, you don't need to reshape the state. Pass it to the model without reshaping.
But when he comes in to do the replay, he gives an error: expected ndim = 3 found ndim = 2
It was where I was even scolded by the wife (lol) for staying until dawn - sometimes - trying to get caught with lstm.
Then I kind of gave up, at that moment and started to see only with dense layers.
I know that the "code works" now.
The input_shape for the LSTM network must be something like this: (batch_size, window, n_features) So, if you want to pass only one sample to the network, you should use the code below:
action = model.predict(np.array([state]))[0]
aoooooo now yes, sir! thanks for your help ... LSTM (actually using "tf.compat.v1.keras.layers.CuDNNLSTM") now worked! uhuuuuuu
but, the GPU utilization, nothing above 10%. I face it! that's what I took to configure, install all the drivers update everything just right, but it didn't work!
but the best that LSTM (CuDNNLSTM) is active! Let's see how AOT_NN will do!
5 hours to run the first episode. my Lord
In the replay
method, you can pass the whole minibatch to the model.predict
or model.fit
methods, not just a single state.
following the suggestion ... Let's see how this "pseudo brain does now"
I just want to see then do the reverse communication with the system that will make the trade, then test the NN in the backtest and then, demo account and then, who knows, one day, in production!
and tell you that it's in the same slowness ... until video card update, drivers, tensorflow ... everything ... now from 10% fo to 5% lol
Even for the part of the start and end time, from 9 to 17, I still haven't managed.
@dlemosmartins The train by time slot are bothering me too, I don't know can help on this by blow url: https://stackoverflow.com/questions/45141079/pandas-read-csv-dataframe-rows-from-specific-date-and-time-range
If you have new discovery, please let me know.
@AminHP AminHP
you can also use this example in order to train and test on the same env to overfit.
Maybe my nested loop code was wrong, I try use example "while" loop and large "total_timesteps" wish can have convergent. Thank you very much.
@0trade And there, my dear. Thank you for the link. It was more or less what I did. I took and transformed the dates to confirm that they were occurring within the same date period. If there were different days, I finish the trade and then I place it to start another sequence. But not completely ending the while. In my case, the donkey here, left the DF as Day | Hour | and indicators that I use ... but for the DF index, I left the time there chipped everything in the logic that I wanted to have done, but the outline of the solution - that I did - seems that "it worked". I adjusted the step_rewards part, for operations inputs and outputs. As well as, I also added in addition to the holding position, I added for him not to always open operations, in this case, using holding too. But what chips me the most here is processing time, my lord of mercy. lol
@AminHP @0trade Gentlemen, I was thinking about coffee and a cigarette ... and the network memory? How will we do it if we put it into production? Should all memory that was inserted during training, be available in production or would "save_model" already do that for us? If not, Redis, MongoDB or any other database with easy access, configuration and speed in bringing the data, would it be an option?
@dlemosmartins
Glad to see your code worked, I just a trader make bad code so I can't tell you about quant things, but I prefer AminHP suggest: stable_baselines
You can easily save or exporting models.
@0trade so we are in the same ... bad code is with myself kkkkkk
yes, i was also happy when it ran the first time, but even in the time it takes, it makes me discouraged. Now I'm thinking of something to start putting on stop loss and maybe even a trailing stop, but it still didn't come to my mind how to do that.
the rule, in the system i use, mt5, calm to put this but here, i don’t know how i could do for a number prediction and i even think about having more than one agent for this. But I don't know yet ...
do you have any suggestions, @AminHP @0trade ?
and for baseline, I'm going to try to get it on my pc ... see if I can install it, but with anaconda, it wasn't working very well.
dlemosmartins,
[Stable Baselines](https://github.com/hill-a/stable-baselines)
is work fine in anaconda, I use anaconda too.
For live trade I think oanda is you first choice, they has a rest API can hold your trading. Perhaps Backtrader
is another way to live trading.
here in Brazil, I can't use oanda. unfortunately there is no such integration here. I face so much technology here and they are stuck only in one company to release the stock market signal. I will see this link from the stable-baselines
Why don't you try to use pip
instead of anaconda
?