dopamine
dopamine copied to clipboard
Value of Epsilon Decay Period
In the TF version of DQN, the value of epsilon_decay_period
is set to 1M steps (see here), and for Rainbow, the value is set to 250k steps (see here).
However, the Rainbow paper says they anneal to 4M frames (i.e. 1M steps) for DQN (as done in Dopamine above), and importantly without Noisy Nets (which is the case with TF Rainbow), they anneal in the first 250K frames (and not steps, which would be 62500 steps with standard frame skipping of 4).
Is there a discrepancy here (Rainbow should anneal within 62k steps and not 250k steps), or am I misunderstanding something (or perhaps it really doesn't matter?). Thank you for your time.
Screenshot of page 4 of Rainbow paper
Also, for the JAX Full Rainbow agent (which has Noisy Nets), and when using Noisy Nets, epsilon greedy is disabled (as in paper snippet above, as well as some other implementations like Kaixhin Rainbow here and here). However, I still see the epsilon_train
set to 0.01 in JAX Full Rainbow (here) and if Noisy is true, the identity_epsilon
function is called which just returns the epsilon value (but doesn't uses 0).
thank you for pointing this out! this has been fixed here: https://github.com/google/dopamine/commit/ed92c57bd547db68d63aabee383d4c55756a6a0f
Thanks! As for
Is there a Is there a discrepancy here (Rainbow should anneal within 62k steps and not 250k steps), or am I misunderstanding something (or perhaps it really doesn't matter?)
Should the epsilon_decay_period
value for TF Rainbow (which does not use Noisy Nets) be 250k frames as in the Rainbow paper (which makes it 62500 steps with frame_skip=4) or 250k steps (as in current implementation) or perhaps it does not matter)? I have rarely seen a value as low as 62500 steps for epsilon decay, for example RLlib also uses 200k for its DQN variant and epislon greedy exploration is off when using Noisy Nets.