deep-trading-agent training issue

Hi! Have a question about training.

After 16 hours of training, I still get average reward 0. Will be happy if you can explain what can be wrong? Maybe it's a problem with default setup parameters?

25%|███▊ | 6374997/25000000 [16:18:15<48:41:33, 106.25it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000135, avg_q: -0.001807, avg_ep_r: 0.0000, max_ep_r: 0.0911, min_ep_r: -0.0984, # game: 5000 26%|███▊ | 6399993/25000000 [16:22:11<48:18:45, 106.94it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000135, avg_q: -0.001469, avg_ep_r: 0.0000, max_ep_r: 0.0985, min_ep_r: -0.0641, # game: 5000 26%|███▊ | 6424989/25000000 [16:26:07<48:06:53, 107.24it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000138, avg_q: -0.001775, avg_ep_r: 0.0001, max_ep_r: 0.1445, min_ep_r: -0.0460, # game: 5000 26%|███▊ | 6449993/25000000 [16:30:03<48:41:25, 105.83it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000134, avg_q: -0.001525, avg_ep_r: -0.0000, max_ep_r: 0.0223, min_ep_r: -0.0371, # game: 5000 26%|███▉ | 6477033/25000000 [16:34:16<47:10:48, 109.06it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000138, avg_q: -0.002763, avg_ep_r: -0.0000, max_ep_r: 0.0302, min_ep_r: -0.0762, # game: 5000 26%|███▉ | 6499197/25000000 [16:37:41<47:10:50, 108.92it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000142, avg_q: -0.003163, avg_ep_r: 0.0000, max_ep_r: 0.0352, min_ep_r: -0.0225, # game: 5000 26%|███▉ | 6526765/25000000 [16:41:56<47:30:54, 108.00it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000135, avg_q: -0.003114, avg_ep_r: -0.0000, max_ep_r: 0.0253, min_ep_r: -0.1445, # game: 5000 26%|███▉ | 6551381/25000000 [16:45:43<47:47:03, 107.25it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000131, avg_q: -0.002506, avg_ep_r: 0.0000, max_ep_r: 0.0643, min_ep_r: -0.0199, # game: 5000 26%|███▉ | 6577145/25000000 [16:49:41<47:26:52, 107.85it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000137, avg_q: -0.001795, avg_ep_r: -0.0000, max_ep_r: 0.0300, min_ep_r: -0.1185, # game: 5000 26%|███▉ | 6599989/25000000 [16:53:14<46:38:00, 109.60it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000137, avg_q: -0.002334, avg_ep_r: -0.0000, max_ep_r: 0.0495, min_ep_r: -0.1122, # game: 5000

Mar 29 '18 02:03 philipshurpik

I believe that you are using code from the main branch in which the model contains earlier implementation of both the batch normalization and the dropout layers. There was an issue with the earlier implementation which I had then corrected in the dev branch. Recently I have merged the dev and master branches. Please use the recent code from the main branch and re-run the training process. If you wish to remove dropout from the model, set keep_prob for each layer to 1.0 in the configuration file.

Mar 29 '18 15:03 samre12

Thanks, I just checked out latest master branch, and run the code with default config. The problem still exists - after one day of learning it learns nothing :(

 47%|██████▌       | 11724993/25000000 [22:58:13<26:00:25, 141.79it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000079, avg_q: 0.027890, avg_ep_r: -0.0001, max_ep_r: 0.2953, min_ep_r: -0.2274, # game: 5000
 47%|██████▌       | 11749993/25000000 [23:01:09<25:57:28, 141.79it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000077, avg_q: 0.027767, avg_ep_r: -0.0002, max_ep_r: 0.2545, min_ep_r: -0.2279, # game: 5000
 47%|██████▌       | 11774985/25000000 [23:04:05<25:54:32, 141.79it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000088, avg_q: 0.026840, avg_ep_r: -0.0001, max_ep_r: 0.3711, min_ep_r: -0.6593, # game: 5000
 47%|██████▌       | 11799997/25000000 [23:07:01<25:51:34, 141.79it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000079, avg_q: 0.026633, avg_ep_r: 0.0000, max_ep_r: 0.2962, min_ep_r: -0.2784, # game: 5000

Apr 03 '18 09:04 philipshurpik

Hi @philipshurpik , can you please share the tensorboard log file if you have generated one, that will help me analyse the training process. Actually I am also fixing this issue right now. It's probably related to the capacity of the model. You could also try to run the model without dropout by setting keep_prob to 1 for all the layers.

Apr 05 '18 04:04 samre12

I haven't taken a deep look at the training scheme, but intuitively, something seems wrong when I graph the following Loss and Q-Value

Loss minimization seems good, which perhaps implies that the networks are learning, but the fact that the Q-Value reward is also becoming minimized seems alarming:

May 08 '18 00:05 joseph-zhong

@joseph-zhong I have been making corrections in the code, the latest changes are reflected in the dev branch where I have integrated this repo with my gym-cryptotrading environment for ease of use. With these bugs removals, I am also targeting to remove this specific issue but currently the btc_sim.average.q curve that I am getting is quite different from what you have shown (this curve might result from an earlier commit) but still converges to 0 towards the end of training. Any help upon this matter is greatly appreciated.

May 08 '18 07:05 samre12

deep-trading-agent deep-trading-agent copied to clipboard

training issue

deep-trading-agent
deep-trading-agent copied to clipboard