seed_rl Unable to reproduce Pong results with a local single-GPU run and paper hyper-params

Hi, I have made minimal changes to run a Pong experiment on a local machine with one visible V100 GPU and 80 CPUs: https://github.com/google-research/seed_rl/compare/master...Antymon:exp/original_seed_gcp_like Hyper-parameters' defaults were overridden with ones from gcp/train_atari.sh except for number of actors (256 with 10 for evaluation). I run the experiment for 0.54e9 frames with the intent of just witnessing the evident improvement of episode reward over the minimal one, i.e., -21. Unfortunately, that didn't happen within the computational budget (as reported in logs of my branch): whereas your csv file suggests that some improvement should be noticeable:

...
Pong,SEED_R2D2,0,259025600.0,-20.17
Pong,SEED_R2D2,0,278963200.0,-19.541999093381687
Pong,SEED_R2D2,0,280024000.0,-19.508585675430645
Pong,SEED_R2D2,0,289027200.0,-19.225
Pong,SEED_R2D2,0,293569600.0,-19.029020588235294
Pong,SEED_R2D2,0,311820800.0,-18.241582352941176
Pong,SEED_R2D2,0,324985600.0,-17.67359411764706
Pong,SEED_R2D2,0,335267200.0,-17.23
Pong,SEED_R2D2,0,341740800.0,-17.270653732602277
Pong,SEED_R2D2,0,381289600.0,-17.519017292281738
Pong,SEED_R2D2,0,386675200.0,-17.552838464782795
Pong,SEED_R2D2,0,399758400.0,-17.635
Pong,SEED_R2D2,0,432996800.0,-17.430631034482758
Pong,SEED_R2D2,0,472219200.0,-17.18946896551724
Pong,SEED_R2D2,0,478638400.0,-17.15
Pong,SEED_R2D2,0,482174400.0,-17.080805186972256
Pong,SEED_R2D2,0,496182400.0,-16.806687273823883
Pong,SEED_R2D2,0,568833600.0,-15.385
....

Therefore I decided to create a reproducibility issue.

My questions would be:

Can you spot any obvious mistake that might have caused the discrepancy?
Have you ever run your local, single-GPU setup for anything but startup demo?

Thanks

EDIT: I span another run with 1 billion frames, still no obvious learning curve.

Dec 05 '20 20:12 Antymon

Hello Szymon,

I don't see any obvious mistake in your setup. It does seem that you use the right hyperparameters, and that indeed the only difference is the number of actors (which should not cause such a difference in training).

We have run some training for some ATARI games on GPU, but I am not sure that we used the local script. Let me first re-run the training on Pong internally, to see whether the code broke and otherwise check the discrepancies with the local script.

Dec 07 '20 15:12 RaphaelMarinier

Hey @RaphaelMarinier, have you had any chance to do the GPU run you mentioned? Many thanks!

Dec 14 '20 07:12 Antymon

Hey @RaphaelMarinier, same issue observed, any update on the solutions?

Hi @Antymon, have you solved this problem?

Feb 24 '21 10:02 bingykang

Hi @Antymon, have you solved this problem?

Nope.

Feb 24 '21 12:02 Antymon

Hi @Antymon, have you solved this problem?

Nope.

Have you tried other envs?

Feb 24 '21 13:02 bingykang

Hi @Antymon, have you solved this problem?

Nope.

Have you tried other envs?

No, I haven't.

Mar 04 '21 15:03 Antymon

seed_rl seed_rl copied to clipboard

Unable to reproduce Pong results with a local single-GPU run and paper hyper-params

seed_rl
seed_rl copied to clipboard