cleanrl icon indicating copy to clipboard operation
cleanrl copied to clipboard

Add PPO + Transformer-XL

Open MarcoMeter opened this issue 10 months ago • 7 comments

Description

Implementation of PPO with Transformer-XL as episodic memory. Based on this repo and paper.

Types of changes

  • [ ] Bug fix
  • [ ] New feature
  • [x] New algorithm
  • [ ] Documentation

Checklist:

  • [x] I've read the CONTRIBUTION guide (required).
  • [ ] I have ensured pre-commit run --all-files passes (required).
  • [ ] I have updated the tests accordingly (if applicable).
  • [ ] I have updated the documentation and previewed the changes via mkdocs serve.
    • [ ] I have explained note-worthy implementation details.
    • [ ] I have explained the logged metrics.
    • [ ] I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

  • [x] I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
  • [ ] I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture_video.
  • [ ] I have performed RLops with python -m openrlbenchmark.rlops.
    • For new feature or bug fix:
      • [ ] I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
    • For new algorithm:
      • [ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • [ ] I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
    • [ ] I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

MarcoMeter avatar Apr 22 '24 08:04 MarcoMeter

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
cleanrl ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 18, 2024 4:49am

vercel[bot] avatar Apr 22 '24 08:04 vercel[bot]

pre-commit

pre-commit fails because of two "obsolet" imports: memory_gym and PoMEnv. Without those imports, the environments are not registered inside gymnasium.

enjoy.py

I added a script to load a trained model and then watch an episode.

ProofofMemory-v0 and MiniGrid-MemoryS9-v0

These environments require memory and converge pretty fast. That's why I included those initially. MemoryGym environments take in more time and resources (especially GPU memory due to the cached hidden states of Transformer-XL).

TODO

I still have to run the benchmarks and write documentation. Besides that, the single file implementation is basically done. I tried to stay close to ppo_atari_lstm.py

MarcoMeter avatar Apr 22 '24 08:04 MarcoMeter

Hey! This looks pretty impressive! Just curious, what is the state of this PR?

roger-creus avatar Jul 27 '24 00:07 roger-creus

Hi @roger-creus the benchmarks just completed. So the next step is to prepare the reports and then to write the docs.

MarcoMeter avatar Jul 27 '24 08:07 MarcoMeter

Nice! Looking forward to the results

roger-creus avatar Jul 27 '24 11:07 roger-creus

It reproduces the results of my paper: https://arxiv.org/abs/2309.17207

and this is the original implementation: https://github.com/MarcoMeter/neroRL

MarcoMeter avatar Jul 27 '24 11:07 MarcoMeter

I'm curious about how it performs in other environments (e.g. atari?)

roger-creus avatar Jul 30 '24 18:07 roger-creus

IMHO, here are the remaining TODOs of this PR:

  • [x] Upload trained models to HuggingFace
  • [x] Download and run these models using /cleanrl/ppo_trxl/enjoy.py
  • [x] Rename blocks to layers (e.g. trxl_num_layers or TransformerLayer(nn.Module))
  • [x] pre-commit still needs to pass
    • It fails due to the "unused" import of memory_gym and PoMEnv
    • If memory_gym is not imported, the environments are not registered
    • Suggestions on this @vwxyzjn ?
    • Solution: #noqa
  • [ ] Keep or remove the Proof of Memory environment (cleanrl/ppo_trxl/pom_env.py)?
    • As an alternative Minigrid-Memory can be used as a much smaller training problem when compared to memory-gym

@roger-creus I don't have results on Atari.

MarcoMeter avatar Sep 09 '24 09:09 MarcoMeter

Keep or remove the Proof of Memory environment (cleanrl/ppo_trxl/pom_env.py)?

Feel free to keep it.

Do you know why the wandb chart looks like this?

image

vwxyzjn avatar Sep 16 '24 14:09 vwxyzjn

Do you know why the wandb chart looks like this? image

What are you referring to? This is how I created the report:

@echo off
python -m openrlbenchmark.rlops ^
    --filters "?we=openrlbenchmark&wpn=cleanRL&ceik=env_id&cen=exp_name&metric=episode/r_mean" ^
    "ppo_trxl?cl=PPO-TrXL" ^
    --env-ids MortarMayhem-Grid-v0 MortarMayhem-v0 Endless-MortarMayhem-v0 MysteryPath-Grid-v0 MysteryPath-v0 Endless-MysteryPath-v0 SearingSpotlights-v0 Endless-SearingSpotlights-v0 ^
    --no-check-empty-runs ^
    --pc.ncols 3 ^
    --pc.ncols-legend 3 ^
    --rliable ^
    --rc.score_normalization_method maxmin ^
    --rc.normalized_score_threshold 1.0 ^
    --rc.sample_efficiency_plots ^
    --rc.sample_efficiency_and_walltime_efficiency_method Median ^
    --rc.performance_profile_plots ^
    --rc.aggregate_metrics_plots ^
    --rc.sample_efficiency_num_bootstrap_reps 10 ^
    --rc.performance_profile_num_bootstrap_reps 10 ^
    --rc.interval_estimates_num_bootstrap_reps 10 ^
    --output-filename memgym/compare ^
    --scan-history ^
    --report

Thanks for your feedback =)

MarcoMeter avatar Sep 16 '24 16:09 MarcoMeter

Oh I meant the error bar (shadow region) is very large for some reason, but it’s fine. I have added you to the list of contributors. Feel free to merge after CI passes.

vwxyzjn avatar Sep 16 '24 17:09 vwxyzjn

It seems that other reports have this as well, like: https://wandb.ai/openrlbenchmark/cleanrl/reports/CleanRL-PPG-vs-PPO-results--VmlldzoyMDY2NzQ5

MarcoMeter avatar Sep 16 '24 17:09 MarcoMeter

I did some refinements:

  • Added hyperparameters to the docs for training MiniGrid-Memory-S9-v0 and ProofOfMemory-v0
  • Added pre-trained models to huggingface for these envs
  • ProofOfMemory-v0 can be adequately rendered now
  • Added link to ppo_trxl.py in README.md

My last step before merging is to make sure that poetry and the dependencies blend well.

MarcoMeter avatar Sep 17 '24 14:09 MarcoMeter

My last step before merging is to make sure that poetry and the dependencies blend well.

Done.

MarcoMeter avatar Sep 18 '24 04:09 MarcoMeter