Marc G. Bellemare
Marc G. Bellemare
Hi Ted, Good catch. I flagged that issue myself before the release, and determined that these parameters were correct. I don't have the immediate evidence handy, but I believe I...
That line is used to compute the gradient norm (this is 'decay=0.95'). A few lines below you have [the momentum term](https://github.com/deepmind/dqn/blob/master/dqn/NeuralQLearner.lua#L276), which is 0 (although it's not clearly stated as...
I'm guessing some tests are failing because they use *nix-style paths (/tmp/...). but the code itself should work -- the only thing you may need to look out for are...
Yes, although at the moment you will need to modify some Atari-specific parameters (convolutional network, observation shape, etc.) I believe most of that code is in place – but stay...
Yes, that's right. Look around the open/closed issues here, I believe other people have generate similar code. Good luck!
Fair point, in hindsight minimal actions were a mistake from day one. IIRC it was more difficult to use the full action set when operating via Gym. It would also...
Hi, This is coming from here: https://github.com/google/dopamine/blob/master/dopamine/replay_memory/circular_replay_buffer.py#L140 The issue is that you are trying to use a replay memory which has small capacity. What did you set this parameter to?...
That does seem a little better. Our baseline results are reporting 1) training scores, 2) with sticky actions. Are you using 2)?
Strange. Maybe the code is getting better as it ages? :) Is the x axis on your tensorboard million agent steps, or million frames (x4 steps)? The numbers would match...
Echoing @psc-g 's earlier comment, I agree `GymProcessing` needs some streamlining. This is on our plate but -- if you have a solution ready for `GymProcessing` in particular, a PR...