cleanrl icon indicating copy to clipboard operation
cleanrl copied to clipboard

ppo with timeout handling

Open Howuhh opened this issue 2 years ago • 12 comments

Description

Closes #198

Types of changes

  • [ ] Bug fix
  • [x] New feature
  • [ ] New algorithm
  • [ ] Documentation

Checklist:

  • [x] I've read the CONTRIBUTION guide (required).
  • [x] I have ensured pre-commit run --all-files passes (required).
  • [ ] I have updated the documentation and previewed the changes via mkdocs serve.
  • [ ] I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.

  • [ ] I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
  • [ ] I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
  • [ ] I have added additional documentation and previewed the changes via mkdocs serve.
    • [ ] I have explained note-worthy implementation details.
    • [ ] I have explained the logged metrics.
    • [ ] I have added links to the original paper and related papers (if applicable).
    • [ ] I have added links to the PR related to the algorithm.
    • [ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • [ ] I have added the learning curves (in PNG format with width=500 and height=300).
    • [ ] I have added links to the tracked experiments.
  • [ ] I have updated the tests accordingly (if applicable).

Howuhh avatar Jun 19 '22 20:06 Howuhh

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Jun 19, 2022 at 8:21PM (UTC)

vercel[bot] avatar Jun 19 '22 20:06 vercel[bot]

I also did an experiment on 3 seeds, 2M steps on HalfCheetah-v3 for comparison, graphs seems ok to me for now wandb graphs: https://wandb.ai/howuhh/cleanrlPPO W B Chart 19 06 2022, 23 23 13

Howuhh avatar Jun 19 '22 20:06 Howuhh

@Howuhh whats your Wandb username? Could you add me on discord Costa#2021?

vwxyzjn avatar Jun 19 '22 20:06 vwxyzjn

@vwxyzjn username is the same: howuhh (link to wandb pofile above), also sent the request on discord

Howuhh avatar Jun 19 '22 20:06 Howuhh

Also I am a bit skeptical about separate file for this, maybe I should add it to the base ppo but with flag to disable it (on the other hand, base ppo implementation will be more complicated)

Howuhh avatar Jun 20 '22 10:06 Howuhh

@Howuhh thanks for preparing this PR. My main thought is prototyping after https://github.com/openai/gym/pull/2752 is merged could save us a lot of refactoring and benchmarking effort.

While it's great that we are looking into this early, I still think we should hold off merging this PR until upstream changes in gym are final (see https://github.com/openai/gym/pull/2752) which introduces a new stepping API that involves truncation at the core signature

obs, rew, terminated, truncated, info = env.step(action)

which will result in a complete refactoring on our side.

Also I am a bit skeptical about separate file for this, maybe I should add it to the base ppo but with flag to disable it (on the other hand, base ppo implementation will be more complicated)

I think it's worth it to add this feature to all the PPO variants we have because it is technically more correct. I am ok with the change as long as we have sufficient benchmark and evidence to show there is no significant lapse in performance.

vwxyzjn avatar Jun 20 '22 14:06 vwxyzjn

@vwxyzjn I agree that it will save us a lot of work if these changes actually happen. However, right now I don't see a consensus on the new API tho

Howuhh avatar Jun 20 '22 14:06 Howuhh

@Howuhh I happen to be involved with this PR ;) to my knowledge, I think this PR will be merged in the next 2-3 months. So let's probably wait a bit if this is not an immediate use case for you.

vwxyzjn avatar Jun 20 '22 15:06 vwxyzjn

@vwxyzjn it's ok, let's wait for new API

Howuhh avatar Jun 20 '22 16:06 Howuhh

@vwxyzjn Hi! Seems like the API change is in the latest gym release. Should I try to update this PR to the new API now?

Howuhh avatar Jul 15 '22 12:07 Howuhh

Yes that would be great! Thank you!

vwxyzjn avatar Jul 15 '22 12:07 vwxyzjn

@Howuhh fyi, @jkterry1 says this

Q: Is the API design finalized in gym? Can we start transitioning to the latest gym in CleanRL?

A: The API changes will be turned on by default and aren't yet
But otherwise there are currently no changes planned to the base APi (edited)
There will continue to be changes to environments, wrappers will be replaced (this is looking like it's going to take forever), the vector API is being redone at unknown, and there will continue to be random small breaking changes
But otherwise ya, that was the big one

vwxyzjn avatar Jul 15 '22 13:07 vwxyzjn

Hey @Howuhh thanks for the PR. There is a new PR at #311

vwxyzjn avatar Dec 16 '22 14:12 vwxyzjn