cleanrl
cleanrl copied to clipboard
Adding TRPO
Description
TRPO is a representative algorithm of policy gradient in reinforcement learning. Although it is no longer practical, its ideas and mathematical principles are still worth considering. Currently, I haven't seen a single-file implementation of TRPO. I'm here to implement a single-file version of TRPO to help beginners understand it.
Types of changes
- [ ] Bug fix
- [ ] New feature
- [x] New algorithm
- [ ] Documentation
Checklist:
- [x] I've read the CONTRIBUTION guide (required).
- [x] I have ensured
pre-commit run --all-files
passes (required). - [ ] I have updated the tests accordingly (if applicable).
- [ ] I have updated the documentation and previewed the changes via
mkdocs serve
.- [ ] I have explained note-worthy implementation details.
- [ ] I have explained the logged metrics.
- [ ] I have added links to the original paper and related papers.
If you need to run benchmark experiments for a performance-impacting changes:
- [ ] I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
- [ ] I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with
--capture_video
. - [ ] I have performed RLops with
python -m openrlbenchmark.rlops
.- For new feature or bug fix:
- [ ] I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
- For new algorithm:
- [ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- [ ] I have added the learning curves generated by the
python -m openrlbenchmark.rlops
utility to the documentation. - [ ] I have added links to the tracked experiments in W&B, generated by
python -m openrlbenchmark.rlops ....your_args... --report
, to the documentation.
- For new feature or bug fix:
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Comments | Updated (UTC) |
---|---|---|---|---|
cleanrl | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | Dec 6, 2023 1:06pm |
Hi this is some cool stuff! Feel free to run some benchmarks with mujoco to see how it performs.