cleanrl icon indicating copy to clipboard operation
cleanrl copied to clipboard

Adding TRPO

Open Jackory opened this issue 1 year ago • 2 comments

Description

TRPO is a representative algorithm of policy gradient in reinforcement learning. Although it is no longer practical, its ideas and mathematical principles are still worth considering. Currently, I haven't seen a single-file implementation of TRPO. I'm here to implement a single-file version of TRPO to help beginners understand it.

Types of changes

  • [ ] Bug fix
  • [ ] New feature
  • [x] New algorithm
  • [ ] Documentation

Checklist:

  • [x] I've read the CONTRIBUTION guide (required).
  • [x] I have ensured pre-commit run --all-files passes (required).
  • [ ] I have updated the tests accordingly (if applicable).
  • [ ] I have updated the documentation and previewed the changes via mkdocs serve.
    • [ ] I have explained note-worthy implementation details.
    • [ ] I have explained the logged metrics.
    • [ ] I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

  • [ ] I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
  • [ ] I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture_video.
  • [ ] I have performed RLops with python -m openrlbenchmark.rlops.
    • For new feature or bug fix:
      • [ ] I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
    • For new algorithm:
      • [ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • [ ] I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
    • [ ] I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

Jackory avatar Nov 30 '23 03:11 Jackory

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
cleanrl ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 6, 2023 1:06pm

vercel[bot] avatar Nov 30 '23 03:11 vercel[bot]

Hi this is some cool stuff! Feel free to run some benchmarks with mujoco to see how it performs.

vwxyzjn avatar Dec 18 '23 15:12 vwxyzjn