cherry
cherry copied to clipboard
PPO dual clip
Description
I added a new option in the PPO loss. This option allows to set up the dual clip PPO (https://arxiv.org/pdf/1912.09729.pdf). This option is very important in complex environments (MOBA, Starcraft and multi-agent environments) because trajectories can be sampled from various source of policies.
Contribution Checklist
If your contribution modifies code in the core library (not docs, tests, or examples), please fill the following checklist.
- [x] My contribution modifies code in the main library.
- [ ] My modifications are tested.
- [x] My modifications are documented.