ml-agents
ml-agents copied to clipboard
[DRAFT][DON'T MERGE] Enable learning from multiple priors
Proposed change(s)
This PR adds a feature that was inspired by the approach used by DeepMind in this paper. By specifying (i.e. giving the directory to that run id) multiple different policies as "priors" in the YAML file, the code will load those policies and use them as regularization priors for the learning policy. This has proven to be effective in Dodgeball, where training with two priors (shooting and flag-getting) leads to a skilled policy in roughly 1/3 the time. See ELO plot below.

This PR also contains a version of WallJump that can be broken down into subtasks.
TODO
Ideally in order for this to be a feature, we'd want too add these components:
- Add to PPO and SAC (currently only in POCA)
- Allow checkpoints with different network architectures (e.g. different num_layers) to be handled properly (I don't see how we could change the obs space and action space, though).
- Solve the entropy issue (entropy seems to increase when more than one prior is specified, probably b/c it's trying to learn a multimodal policy.
- Documentation and tests
Types of change(s)
- [ ] Bug fix
- [x] New feature
- [ ] Code refactor
- [ ] Breaking change
- [ ] Documentation update
- [ ] Other (please describe)
Checklist
- [ ] Added tests that prove my fix is effective or that my feature works
- [ ] Updated the changelog (if applicable)
- [ ] Updated the documentation (if applicable)
- [ ] Updated the migration guide (if applicable)
Other comments
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
Ervin Teng seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.