cleanrl
cleanrl copied to clipboard
Add rnd_ppo.py documentation and refactor
Description
Closes #127
Types of changes
- [ ] Bug fix
- [ ] New feature
- [ ] New algorithm
- [x] Documentation
Checklist:
- [x] I've read the CONTRIBUTION guide (required).
- [x] I have ensured
pre-commit run --all-files
passes (required). - [x] I have updated the documentation and previewed the changes via
mkdocs serve
. - [ ] I have updated the tests accordingly (if applicable).
If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.
- [x] I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
- [ ] I have tracked applicable experiments in openrlbenchmark/cleanrl with
--capture-video
flag toggled on (required). - [ ] I have added additional documentation and previewed the changes via
mkdocs serve
.- [x] I have explained note-worthy implementation details.
- [ ] I have explained the logged metrics.
- [x] I have added links to the original paper and related papers (if applicable).
- [ ] I have added links to the PR related to the algorithm.
- [ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- [ ] I have added the learning curves (in PNG format with
width=500
andheight=300
). - [ ] I have added links to the tracked experiments.
- [ ] I have updated the tests accordingly (if applicable).
This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.
🔍 Inspect: https://vercel.com/vwxyzjn/cleanrl/FVfp6xKi7pTtnaXPFTa7dhPRKnqL
✅ Preview: https://cleanrl-git-rnd-doc-vwxyzjn.vercel.app
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Updated |
---|---|---|---|
cleanrl | ✅ Ready (Inspect) | Visit Preview | Aug 25, 2022 at 3:56AM (UTC) |
Hey, @yooceii would you mind reverting the formatting change? They make it harder to review and identify the code specifically relating to RND.

Formatting change should be done in a larger PR in #167 together...
Refactor Check (compatible with the performance in the tracked experiment)
I compared the SPS performance of the latest refactor against the old script used in the tracked experiment. I named the old one old_ppo_rnd_envpool.py
and ran it and the latest script without initializing observation normalization parameter.
The following screenshot confirms the refactor does not result in a performance difference (and that we did the refactor correctly — making it faster without impacting sample efficiency)
