dopamine [Question] Value of tf.train.RMSPropOptimizer.momentum in dqn

First, thanks for sharing dopamine with all of us. It definitely improves productivity!

In the Minh et al. (2015) Nature paper on DQN, Extended Data Table 1 lists the gradient momentum as 0.95. However, in dqn_nature.gin the value seems to be set to 0.0:

tf.train.RMSPropOptimizer.learning_rate = 0.00025
tf.train.RMSPropOptimizer.decay = 0.95
tf.train.RMSPropOptimizer.momentum = 0.0
tf.train.RMSPropOptimizer.epsilon = 0.00001
tf.train.RMSPropOptimizer.centered = True

I believe this particular configuration file is supposed to offer the same hyperparameter settings as the Nature paper. Or perhaps I'm not interpreting the parameters correctly. Anyway, should it be 0.0 or 0.95?

Thanks much!

--Ted

Oct 25 '18 20:10 tlwillke

Hi Ted,

Good catch. I flagged that issue myself before the release, and determined that these parameters were correct. I don't have the immediate evidence handy, but I believe I took those parameters from the open-source DQN code in Lua.

Oct 25 '18 21:10 mgbellemare

Thanks for the quick response, Marc. Yes, it looks like the momentum was set to 0.95 in the lua code. The Lua setting matches the Nature paper and disagrees with the dqn_nature.gin file setting of 0.0. Shall I submit a PR?

Oct 25 '18 21:10 tlwillke

That line is used to compute the gradient norm (this is 'decay=0.95'). A few lines below you have the momentum term, which is 0 (although it's not clearly stated as such, that's what the mul(0) does).

Oct 26 '18 01:10 mgbellemare

[Question] Value of tf.train.RMSPropOptimizer.momentum in dqn_nature.gin