fferreres
fferreres
I saw the same on a3c after 1.9 million steps:  And attributed it to the fact that I only used 4 learners (2 are hyper threads, only...
1) I don't know how to add the graph the right way to review with TensorBoard, since at `tf.train.SummaryWriter(..., sess.graph)`time the session isn't created yet (but much later). If I...
> Depending on how conv2d combines channel filters, maybe history is getting reduced to one state in the first convolutional layer. Maybe depthwise_conv2d is relevant or, alternatively, flattening the stack...
Is something like this possible? If the action is not taken, set the target value be equal to the current Q prediction for that action. For the (one) Action taken,...
Denny, > Really good point and may be the key here. This could make a huge difference. I looked at the paper again and I actually can't find details on...
@dennybritz, yeah, I have things mixed up. Too many things to try to understand too quickly (new to Python, Numpy, TensorFlow, vim, Machine Learning, RL, refresh statistics/prob 101 :-)
Here are a few threads discussing low scores vs paper. It can be a source of troubleshooting ideas: https://github.com/tambetm/simple_dqn/issues/32 https://groups.google.com/forum/m/#!topic/deep-q-learning/JV384mQcylo https://docs.google.com/spreadsheets/d/1nJIvkLOFk_zEfejS-Z-pHn5OGg-wDYJ8GI5Xn2wsKIk/htmlview I re-read all the DQN.py (not the notebook) and...
Something entirely plausible is that we didn't try "hard enough". Look for example at breakout in this page: https://github.com/Jabberwockyll/deep_rl_ale/wiki It remains close to 0 even at 4 million steps, then...
I found base_coder and the modified coders and their codes intractable. But the " help" coder has been helpful for a minimalist understanding of it, based on io's def cmd_help(self,...
fixed with the later commits you did. thank you!