cleanrl
cleanrl copied to clipboard
ppo with timeout handling
Description
Closes #198
Types of changes
- [ ] Bug fix
- [x] New feature
- [ ] New algorithm
- [ ] Documentation
Checklist:
- [x] I've read the CONTRIBUTION guide (required).
- [x] I have ensured
pre-commit run --all-files
passes (required). - [ ] I have updated the documentation and previewed the changes via
mkdocs serve
. - [ ] I have updated the tests accordingly (if applicable).
If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.
- [ ] I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
- [ ] I have tracked applicable experiments in openrlbenchmark/cleanrl with
--capture-video
flag toggled on (required). - [ ] I have added additional documentation and previewed the changes via
mkdocs serve
.- [ ] I have explained note-worthy implementation details.
- [ ] I have explained the logged metrics.
- [ ] I have added links to the original paper and related papers (if applicable).
- [ ] I have added links to the PR related to the algorithm.
- [ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- [ ] I have added the learning curves (in PNG format with
width=500
andheight=300
). - [ ] I have added links to the tracked experiments.
- [ ] I have updated the tests accordingly (if applicable).
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Updated |
---|---|---|---|
cleanrl | ✅ Ready (Inspect) | Visit Preview | Jun 19, 2022 at 8:21PM (UTC) |
I also did an experiment on 3 seeds, 2M steps on HalfCheetah-v3 for comparison, graphs seems ok to me for now wandb graphs: https://wandb.ai/howuhh/cleanrlPPO
@Howuhh whats your Wandb username? Could you add me on discord Costa#2021?
@vwxyzjn username is the same: howuhh (link to wandb pofile above), also sent the request on discord
Also I am a bit skeptical about separate file for this, maybe I should add it to the base ppo but with flag to disable it (on the other hand, base ppo implementation will be more complicated)
@Howuhh thanks for preparing this PR. My main thought is prototyping after https://github.com/openai/gym/pull/2752 is merged could save us a lot of refactoring and benchmarking effort.
While it's great that we are looking into this early, I still think we should hold off merging this PR until upstream changes in gym are final (see https://github.com/openai/gym/pull/2752) which introduces a new stepping API that involves truncation at the core signature
obs, rew, terminated, truncated, info = env.step(action)
which will result in a complete refactoring on our side.
Also I am a bit skeptical about separate file for this, maybe I should add it to the base ppo but with flag to disable it (on the other hand, base ppo implementation will be more complicated)
I think it's worth it to add this feature to all the PPO variants we have because it is technically more correct. I am ok with the change as long as we have sufficient benchmark and evidence to show there is no significant lapse in performance.
@vwxyzjn I agree that it will save us a lot of work if these changes actually happen. However, right now I don't see a consensus on the new API tho
@Howuhh I happen to be involved with this PR ;) to my knowledge, I think this PR will be merged in the next 2-3 months. So let's probably wait a bit if this is not an immediate use case for you.
@vwxyzjn it's ok, let's wait for new API
@vwxyzjn Hi! Seems like the API change is in the latest gym release. Should I try to update this PR to the new API now?
Yes that would be great! Thank you!
@Howuhh fyi, @jkterry1 says this
Q: Is the API design finalized in gym? Can we start transitioning to the latest gym in CleanRL?
A: The API changes will be turned on by default and aren't yet
But otherwise there are currently no changes planned to the base APi (edited)
There will continue to be changes to environments, wrappers will be replaced (this is looking like it's going to take forever), the vector API is being redone at unknown, and there will continue to be random small breaking changes
But otherwise ya, that was the big one
Hey @Howuhh thanks for the PR. There is a new PR at #311