dopamine
dopamine copied to clipboard
[Question] Result better than baseline
I run some tests on dopamine. When I run the game 'Qbert', the result seems better than the result presented in the baseline.
I use here config located in dopamine/agents/rainbow/configs/rainbow.gin. I only change the environment name to Qbert.

That does seem a little better. Our baseline results are reporting 1) training scores, 2) with sticky actions. Are you using 2)?
I use here config located in dopamine/agents/rainbow/configs/rainbow.gin.
So the configuration seems with sticky actions.
Strange. Maybe the code is getting better as it ages? :)
Is the x axis on your tensorboard million agent steps, or million frames (x4 steps)? The numbers would match the former.
I didn't change any training code, nor tensorboard summary code. So I think it matches the baseline. BTW, I run the training for Pong before and the score didn't reach 15 as the baseline showed. I didn't know the reason, so I run rainbow for the game Qbert.

@Seraphli, Were you using a GPU to run the code and how long did it take to get the above results?
@satyakesav 1d13h for first 100 iterations on Qbert
@Seraphli, Ahh. I see. What GPU did you train it on?
@satyakesav 1080ti
@satyakesav 1d13h for first 100 iterations on
Qbert
That is really fast, normally for DQN, one iteration takes 40 minutes on V100.