deep-trading-agent icon indicating copy to clipboard operation
deep-trading-agent copied to clipboard

training issue

Open philipshurpik opened this issue 6 years ago • 5 comments

Hi! Have a question about training.

After 16 hours of training, I still get average reward 0. Will be happy if you can explain what can be wrong? Maybe it's a problem with default setup parameters?

25%|███▊ | 6374997/25000000 [16:18:15<48:41:33, 106.25it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000135, avg_q: -0.001807, avg_ep_r: 0.0000, max_ep_r: 0.0911, min_ep_r: -0.0984, # game: 5000 26%|███▊ | 6399993/25000000 [16:22:11<48:18:45, 106.94it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000135, avg_q: -0.001469, avg_ep_r: 0.0000, max_ep_r: 0.0985, min_ep_r: -0.0641, # game: 5000 26%|███▊ | 6424989/25000000 [16:26:07<48:06:53, 107.24it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000138, avg_q: -0.001775, avg_ep_r: 0.0001, max_ep_r: 0.1445, min_ep_r: -0.0460, # game: 5000 26%|███▊ | 6449993/25000000 [16:30:03<48:41:25, 105.83it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000134, avg_q: -0.001525, avg_ep_r: -0.0000, max_ep_r: 0.0223, min_ep_r: -0.0371, # game: 5000 26%|███▉ | 6477033/25000000 [16:34:16<47:10:48, 109.06it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000138, avg_q: -0.002763, avg_ep_r: -0.0000, max_ep_r: 0.0302, min_ep_r: -0.0762, # game: 5000 26%|███▉ | 6499197/25000000 [16:37:41<47:10:50, 108.92it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000142, avg_q: -0.003163, avg_ep_r: 0.0000, max_ep_r: 0.0352, min_ep_r: -0.0225, # game: 5000 26%|███▉ | 6526765/25000000 [16:41:56<47:30:54, 108.00it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000135, avg_q: -0.003114, avg_ep_r: -0.0000, max_ep_r: 0.0253, min_ep_r: -0.1445, # game: 5000 26%|███▉ | 6551381/25000000 [16:45:43<47:47:03, 107.25it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000131, avg_q: -0.002506, avg_ep_r: 0.0000, max_ep_r: 0.0643, min_ep_r: -0.0199, # game: 5000 26%|███▉ | 6577145/25000000 [16:49:41<47:26:52, 107.85it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000137, avg_q: -0.001795, avg_ep_r: -0.0000, max_ep_r: 0.0300, min_ep_r: -0.1185, # game: 5000 26%|███▉ | 6599989/25000000 [16:53:14<46:38:00, 109.60it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000137, avg_q: -0.002334, avg_ep_r: -0.0000, max_ep_r: 0.0495, min_ep_r: -0.1122, # game: 5000

philipshurpik avatar Mar 29 '18 02:03 philipshurpik

I believe that you are using code from the main branch in which the model contains earlier implementation of both the batch normalization and the dropout layers. There was an issue with the earlier implementation which I had then corrected in the dev branch. Recently I have merged the dev and master branches. Please use the recent code from the main branch and re-run the training process. If you wish to remove dropout from the model, set keep_prob for each layer to 1.0 in the configuration file.

samre12 avatar Mar 29 '18 15:03 samre12

Thanks, I just checked out latest master branch, and run the code with default config. The problem still exists - after one day of learning it learns nothing :(

 47%|██████▌       | 11724993/25000000 [22:58:13<26:00:25, 141.79it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000079, avg_q: 0.027890, avg_ep_r: -0.0001, max_ep_r: 0.2953, min_ep_r: -0.2274, # game: 5000
 47%|██████▌       | 11749993/25000000 [23:01:09<25:57:28, 141.79it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000077, avg_q: 0.027767, avg_ep_r: -0.0002, max_ep_r: 0.2545, min_ep_r: -0.2279, # game: 5000
 47%|██████▌       | 11774985/25000000 [23:04:05<25:54:32, 141.79it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000088, avg_q: 0.026840, avg_ep_r: -0.0001, max_ep_r: 0.3711, min_ep_r: -0.6593, # game: 5000
 47%|██████▌       | 11799997/25000000 [23:07:01<25:51:34, 141.79it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000079, avg_q: 0.026633, avg_ep_r: 0.0000, max_ep_r: 0.2962, min_ep_r: -0.2784, # game: 5000

philipshurpik avatar Apr 03 '18 09:04 philipshurpik

Hi @philipshurpik , can you please share the tensorboard log file if you have generated one, that will help me analyse the training process. Actually I am also fixing this issue right now. It's probably related to the capacity of the model. You could also try to run the model without dropout by setting keep_prob to 1 for all the layers.

samre12 avatar Apr 05 '18 04:04 samre12

I haven't taken a deep look at the training scheme, but intuitively, something seems wrong when I graph the following Loss and Q-Value

Loss minimization seems good, which perhaps implies that the networks are learning, but the fact that the Q-Value reward is also becoming minimized seems alarming:

image

joseph-zhong avatar May 08 '18 00:05 joseph-zhong

@joseph-zhong I have been making corrections in the code, the latest changes are reflected in the dev branch where I have integrated this repo with my gym-cryptotrading environment for ease of use. With these bugs removals, I am also targeting to remove this specific issue but currently the btc_sim.average.q curve that I am getting is quite different from what you have shown (this curve might result from an earlier commit) but still converges to 0 towards the end of training. Any help upon this matter is greatly appreciated.

samre12 avatar May 08 '18 07:05 samre12