cleanrl
cleanrl copied to clipboard
Add PPO + Transformer-XL
Description
Implementation of PPO with Transformer-XL as episodic memory. Based on this repo and paper.
Types of changes
- [ ] Bug fix
- [ ] New feature
- [x] New algorithm
- [ ] Documentation
Checklist:
- [x] I've read the CONTRIBUTION guide (required).
- [ ] I have ensured
pre-commit run --all-files
passes (required). - [ ] I have updated the tests accordingly (if applicable).
- [ ] I have updated the documentation and previewed the changes via
mkdocs serve
.- [ ] I have explained note-worthy implementation details.
- [ ] I have explained the logged metrics.
- [ ] I have added links to the original paper and related papers.
If you need to run benchmark experiments for a performance-impacting changes:
- [x] I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
- [ ] I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with
--capture_video
. - [ ] I have performed RLops with
python -m openrlbenchmark.rlops
.- For new feature or bug fix:
- [ ] I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
- For new algorithm:
- [ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- [ ] I have added the learning curves generated by the
python -m openrlbenchmark.rlops
utility to the documentation. - [ ] I have added links to the tracked experiments in W&B, generated by
python -m openrlbenchmark.rlops ....your_args... --report
, to the documentation.
- For new feature or bug fix:
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Comments | Updated (UTC) |
---|---|---|---|---|
cleanrl | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | Sep 18, 2024 4:49am |
pre-commit
pre-commit fails because of two "obsolet" imports: memory_gym and PoMEnv. Without those imports, the environments are not registered inside gymnasium.
enjoy.py
I added a script to load a trained model and then watch an episode.
ProofofMemory-v0 and MiniGrid-MemoryS9-v0
These environments require memory and converge pretty fast. That's why I included those initially. MemoryGym environments take in more time and resources (especially GPU memory due to the cached hidden states of Transformer-XL).
TODO
I still have to run the benchmarks and write documentation. Besides that, the single file implementation is basically done. I tried to stay close to ppo_atari_lstm.py
Hey! This looks pretty impressive! Just curious, what is the state of this PR?
Hi @roger-creus the benchmarks just completed. So the next step is to prepare the reports and then to write the docs.
Nice! Looking forward to the results
It reproduces the results of my paper: https://arxiv.org/abs/2309.17207
and this is the original implementation: https://github.com/MarcoMeter/neroRL
I'm curious about how it performs in other environments (e.g. atari?)
IMHO, here are the remaining TODOs of this PR:
- [x] Upload trained models to HuggingFace
- [x] Download and run these models using /cleanrl/ppo_trxl/enjoy.py
- [x] Rename
blocks
to layers (e.g.trxl_num_layers
orTransformerLayer(nn.Module)
) - [x] pre-commit still needs to pass
- It fails due to the "unused" import of memory_gym and PoMEnv
- If memory_gym is not imported, the environments are not registered
- Suggestions on this @vwxyzjn ?
- Solution:
#noqa
- [ ] Keep or remove the Proof of Memory environment (
cleanrl/ppo_trxl/pom_env.py
)?- As an alternative Minigrid-Memory can be used as a much smaller training problem when compared to memory-gym
@roger-creus I don't have results on Atari.
Keep or remove the Proof of Memory environment (cleanrl/ppo_trxl/pom_env.py)?
Feel free to keep it.
Do you know why the wandb chart looks like this?
Do you know why the wandb chart looks like this?
What are you referring to? This is how I created the report:
@echo off
python -m openrlbenchmark.rlops ^
--filters "?we=openrlbenchmark&wpn=cleanRL&ceik=env_id&cen=exp_name&metric=episode/r_mean" ^
"ppo_trxl?cl=PPO-TrXL" ^
--env-ids MortarMayhem-Grid-v0 MortarMayhem-v0 Endless-MortarMayhem-v0 MysteryPath-Grid-v0 MysteryPath-v0 Endless-MysteryPath-v0 SearingSpotlights-v0 Endless-SearingSpotlights-v0 ^
--no-check-empty-runs ^
--pc.ncols 3 ^
--pc.ncols-legend 3 ^
--rliable ^
--rc.score_normalization_method maxmin ^
--rc.normalized_score_threshold 1.0 ^
--rc.sample_efficiency_plots ^
--rc.sample_efficiency_and_walltime_efficiency_method Median ^
--rc.performance_profile_plots ^
--rc.aggregate_metrics_plots ^
--rc.sample_efficiency_num_bootstrap_reps 10 ^
--rc.performance_profile_num_bootstrap_reps 10 ^
--rc.interval_estimates_num_bootstrap_reps 10 ^
--output-filename memgym/compare ^
--scan-history ^
--report
Thanks for your feedback =)
Oh I meant the error bar (shadow region) is very large for some reason, but it’s fine. I have added you to the list of contributors. Feel free to merge after CI passes.
It seems that other reports have this as well, like: https://wandb.ai/openrlbenchmark/cleanrl/reports/CleanRL-PPG-vs-PPO-results--VmlldzoyMDY2NzQ5
I did some refinements:
- Added hyperparameters to the docs for training MiniGrid-Memory-S9-v0 and ProofOfMemory-v0
- Added pre-trained models to huggingface for these envs
- ProofOfMemory-v0 can be adequately rendered now
- Added link to ppo_trxl.py in README.md
My last step before merging is to make sure that poetry and the dependencies blend well.
My last step before merging is to make sure that poetry and the dependencies blend well.
Done.