batch_rl Difference b/w checkpoint 49 and checkpoint 50

Are the 50 checkpoints indexed 0...49 or 1...50?

The following games are missing gs://atari-replay-datasets/dqn/${g}/1/replay_logs/FILE.50.gz

$ ./check_c50.sh
Carnival missing ckpt50
Centipede missing ckpt50
IceHockey missing ckpt50
StarGunner missing ckpt50
VideoPinball missing ckpt50
YarsRevenge missing ckpt50

check_c50.sh:

games='AirRaid Alien Amidar Assault Asterix Asteroids Atlantis BankHeist BattleZone BeamRider Berzerk Bowling Boxing Breakout Carnival Centipede ChopperCommand CrazyClimber DemonAttack DoubleDunk ElevatorAction Enduro FishingDerby Freeway Frostbite Gopher Gravitar Hero IceHockey Jamesbond JourneyEscape Kangaroo Krull KungFuMaster MontezumaRevenge MsPacman NameThisGame Phoenix Pitfall Pong Pooyan PrivateEye Qbert Riverraid RoadRunner Robotank Seaquest Skiing Solaris SpaceInvaders StarGunner Tennis TimePilot Tutankham UpNDown Venture VideoPinball WizardOfWor YarsRevenge Zaxxon'

for g in ${games[@]}; do
  output=$(gsutil ls gs://atari-replay-datasets/dqn/${g}/1/replay_logs/)
  # echo -n "${g} "
  # echo -n "${output} " | wc -l
  if [ -z "$(echo ${output} | grep 50)" ] ; then echo "${g} missing ckpt50" ; fi
done;

Feb 16 '23 20:02 etaoxing

Additional context:

DT uses 0...49, while SGI uses 1...50.

Feb 16 '23 20:02 etaoxing

To be safe, I'd use 0..49. I think the 50th checkpoint do not exist for all the games as it might be a Dopamine artifact. Checkpoint 0 stores the first 1M steps (=4M frames), ckpt 1 stores the next 4M frames and so on ..

That said, if you are doing apples to apples comparison to SGI, maybe doing they did would make more sense.

Feb 20 '23 18:02 agarwl

The other option (which I also use these days) is to use the tfds dataset version, which I believe also has 0..49: https://colab.sandbox.google.com/github/google-research/rlds/blob/main/rlds/examples/tfds_rlu_atari.ipynb

For some starter code, please see the supplementary material for ICLR'23 paper on scaled Q-learning.

Feb 20 '23 18:02 agarwl

Out of curiosity, @agarwl, is the 50th checkpoint the last 1M steps for those games for which it exists?

Jan 18 '24 23:01 kaustubhsridhar

I think I'd just use the buffer 49 still (unless all the experiments are using buffer 50).

Jan 18 '24 23:01 agarwl