deep-trading-agent
deep-trading-agent copied to clipboard
training issue
Hi! Have a question about training.
After 16 hours of training, I still get average reward 0. Will be happy if you can explain what can be wrong? Maybe it's a problem with default setup parameters?
25%|███▊ | 6374997/25000000 [16:18:15<48:41:33, 106.25it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000135, avg_q: -0.001807, avg_ep_r: 0.0000, max_ep_r: 0.0911, min_ep_r: -0.0984, # game: 5000 26%|███▊ | 6399993/25000000 [16:22:11<48:18:45, 106.94it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000135, avg_q: -0.001469, avg_ep_r: 0.0000, max_ep_r: 0.0985, min_ep_r: -0.0641, # game: 5000 26%|███▊ | 6424989/25000000 [16:26:07<48:06:53, 107.24it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000138, avg_q: -0.001775, avg_ep_r: 0.0001, max_ep_r: 0.1445, min_ep_r: -0.0460, # game: 5000 26%|███▊ | 6449993/25000000 [16:30:03<48:41:25, 105.83it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000134, avg_q: -0.001525, avg_ep_r: -0.0000, max_ep_r: 0.0223, min_ep_r: -0.0371, # game: 5000 26%|███▉ | 6477033/25000000 [16:34:16<47:10:48, 109.06it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000138, avg_q: -0.002763, avg_ep_r: -0.0000, max_ep_r: 0.0302, min_ep_r: -0.0762, # game: 5000 26%|███▉ | 6499197/25000000 [16:37:41<47:10:50, 108.92it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000142, avg_q: -0.003163, avg_ep_r: 0.0000, max_ep_r: 0.0352, min_ep_r: -0.0225, # game: 5000 26%|███▉ | 6526765/25000000 [16:41:56<47:30:54, 108.00it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000135, avg_q: -0.003114, avg_ep_r: -0.0000, max_ep_r: 0.0253, min_ep_r: -0.1445, # game: 5000 26%|███▉ | 6551381/25000000 [16:45:43<47:47:03, 107.25it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000131, avg_q: -0.002506, avg_ep_r: 0.0000, max_ep_r: 0.0643, min_ep_r: -0.0199, # game: 5000 26%|███▉ | 6577145/25000000 [16:49:41<47:26:52, 107.85it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000137, avg_q: -0.001795, avg_ep_r: -0.0000, max_ep_r: 0.0300, min_ep_r: -0.1185, # game: 5000 26%|███▉ | 6599989/25000000 [16:53:14<46:38:00, 109.60it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000137, avg_q: -0.002334, avg_ep_r: -0.0000, max_ep_r: 0.0495, min_ep_r: -0.1122, # game: 5000
I believe that you are using code from the main branch in which the model contains earlier implementation of both the batch normalization and the dropout layers. There was an issue with the earlier implementation which I had then corrected in the dev branch. Recently I have merged the dev and master branches.
Please use the recent code from the main branch and re-run the training process.
If you wish to remove dropout from the model, set keep_prob
for each layer to 1.0 in the configuration file.
Thanks, I just checked out latest master branch, and run the code with default config. The problem still exists - after one day of learning it learns nothing :(
47%|██████▌ | 11724993/25000000 [22:58:13<26:00:25, 141.79it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000079, avg_q: 0.027890, avg_ep_r: -0.0001, max_ep_r: 0.2953, min_ep_r: -0.2274, # game: 5000
47%|██████▌ | 11749993/25000000 [23:01:09<25:57:28, 141.79it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000077, avg_q: 0.027767, avg_ep_r: -0.0002, max_ep_r: 0.2545, min_ep_r: -0.2279, # game: 5000
47%|██████▌ | 11774985/25000000 [23:04:05<25:54:32, 141.79it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000088, avg_q: 0.026840, avg_ep_r: -0.0001, max_ep_r: 0.3711, min_ep_r: -0.6593, # game: 5000
47%|██████▌ | 11799997/25000000 [23:07:01<25:51:34, 141.79it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000079, avg_q: 0.026633, avg_ep_r: 0.0000, max_ep_r: 0.2962, min_ep_r: -0.2784, # game: 5000
Hi @philipshurpik , can you please share the tensorboard log file if you have generated one, that will help me analyse the training process. Actually I am also fixing this issue right now.
It's probably related to the capacity of the model. You could also try to run the model without dropout by setting keep_prob
to 1 for all the layers.
I haven't taken a deep look at the training scheme, but intuitively, something seems wrong when I graph the following Loss and Q-Value
Loss minimization seems good, which perhaps implies that the networks are learning, but the fact that the Q-Value reward is also becoming minimized seems alarming:
@joseph-zhong I have been making corrections in the code, the latest changes are reflected in the dev branch where I have integrated this repo with my gym-cryptotrading
environment for ease of use.
With these bugs removals, I am also targeting to remove this specific issue but currently the btc_sim.average.q
curve that I am getting is quite different from what you have shown (this curve might result from an earlier commit) but still converges to 0 towards the end of training.
Any help upon this matter is greatly appreciated.