open_spiel icon indicating copy to clipboard operation
open_spiel copied to clipboard

Adapt PPO algorithm from CleanRL to OpenSpiel. Adapt Gym Atari environment to OpenSpiel

Open newmanne opened this issue 1 year ago • 9 comments

@lanctot @ssokota: @gregdeon and I have adapted @vwxyzjn CleanRL's PPO implementation to work with OpenSpiel. In order to test the implementation, we have also wrapped Gym's Atari games into an OpenSpiel game.

Notes

  • We included a SyncVectorEnv class modelled after Gym's class but using OpenSpiel RL Environments. Using multiple environments is a big deal for performance.
  • This only works for single player games right now. We plan to get a multiplayer version running, but there are some challenges. Namely, when using a SyncVectorEnv, the "turn order" of players in each of the environments can quickly get out of sync, and that really is not trivial to resolve. The games @gregdeon and I are studying are simultaneous move games with no player elimination - this class of games can never go out of sync (the same agent will always be acting in every environment). This is all we were thinking to support, but perhaps we are overlooking a simpler way to support arbitrary turn orders (in combination with vector environments).
  • Atari will only work if gym and the ROMS and stable baselines are all installed.

Evaluation

  • Catch will converge to a score of 1.0 every episode fairly quickly (71_680 steps)
  • Performance of three seeds of Breakout on TensorBoard (heavily smoothed) after 10_000_000 iterations (~8 wall time hours on our setup) atari You can compare to this example, we don't match exactly but seem to be in the right ballpark.

Example commands:

python ppo_example.py --game-name catch // runs catch python ppo_example.py // runs breakout

newmanne avatar Jul 23 '22 01:07 newmanne

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

google-cla[bot] avatar Jul 23 '22 01:07 google-cla[bot]

Awesome!

First thing: something went wrong with the CLA. Can you look into the message above and follow the steps, and let me know when you have and I will rerun it.

lanctot avatar Jul 23 '22 03:07 lanctot

Would be nice to have a non-Atari test that solves a really basic single-player game. See the pytorch DQN test for how to construct one using the gambit EFG format: https://github.com/deepmind/open_spiel/blob/b5e0bf6495bd8baf7c6011c18fa4fd403e21385d/open_spiel/python/pytorch/dqn_pytorch_test.py#L39

Note: when you add a python test you also need to add it to python/CMakeLists.txt.

lanctot avatar Jul 23 '22 03:07 lanctot

Awesome!

First thing: something went wrong with the CLA. Can you look into the message above and follow the steps, and let me know when you have and I will rerun it.

It seems that one of the commits is marked as an AWS user that has not signed the CLA: image

Unfortunately I can't import the PR until this is resolved.

I think probably the easiest thing is to do a fresh branch of master and copy over all the files (individually-- not with the .git subdirectories) and open a fresh new PR with those files. That way, it should only show up as a single commit from the main author.

lanctot avatar Jul 23 '22 13:07 lanctot

Awesome, thanks. Can you add PPO to docs/algorithms.md and Atari to docs/games.md?

lanctot avatar Jul 27 '22 10:07 lanctot

Thanks @vwxyzjn for taking a look!

Apologies @newmanne for the delay, I was on vacation for a large part of July. I'm back now and will ask one of the team to take a quick look as well.

lanctot avatar Aug 04 '22 12:08 lanctot

Hi also a quick comment: would it be possible to add more benchmark? I am quite interested to see the performance of the agent in the environments that OpenSpiel provides. E.g,. see our ppo_atari.py docs for how we usually do it.

image

We certainly don't need to do it in this PR, but having more benchmark would be quite nice.

Ideally, it would be great if you can contribute the tracked experiments to the Open RL Benchmark, which makes everything about the experiments super transparent. It is leveraging wandb (a proprietary service), however, so no worries if this is difficult.

vwxyzjn avatar Aug 04 '22 23:08 vwxyzjn

Hmmm, test failed in Colored Trails (due to observing a utility greater than the maximum) -- that's probably a bug on our end, I'll look into it today.

lanctot avatar Aug 08 '22 10:08 lanctot

Hmmm, test failed in Colored Trails (due to observing a utility greater than the maximum) -- that's probably a bug on our end, I'll look into it today.

Confirmed bug on our side, fix is lines 186-187 in colored_trails.h here: https://github.com/deepmind/open_spiel/pull/900/files.

lanctot avatar Aug 08 '22 11:08 lanctot

Thanks guys. I've started the tests, but they will probably fail. There was a change on GitHub's configuration that broke our tests today. I finally got it fixed and updated master a few minutes ago. You'll probably need to pull changes from master in order for the tests to pass (which would be a good thing anyway since this PR's branch has some large changes and was started a while back).

lanctot avatar Nov 07 '22 21:11 lanctot

Hi guys, can you update the comments on the github PR thread to either reply or mark the as resolved?

@newmanne @gregdeon

lanctot avatar Nov 15 '22 11:11 lanctot

@lanctot @gregdeon I think everything should be resolved now

newmanne avatar Nov 15 '22 23:11 newmanne