Hybrid PPO
Implementation of Hybrid PPO
Description
Closes #202 (Description will follow)
Context
- [x] I have raised an issue to propose this change (required) #202
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Documentation (update in the documentation)
Checklist:
- [x] I've read the CONTRIBUTION guide (required)
- [ ] The functionality/performance matches that of the source (required for new training algorithms or training-related features).
- [ ] I have updated the tests accordingly (required for a bug fix or a new feature).
- [ ] I have included an example of using the feature (required for new features).
- [ ] I have included baseline results (required for new training algorithms or training-related features).
- [ ] I have updated the documentation accordingly.
- [ ] I have updated the changelog accordingly (required).
- [ ] I have reformatted the code using
make format(required) - [ ] I have checked the codestyle using
make check-codestyleandmake lint(required) - [ ] I have ensured
make pytestandmake typeboth pass. (required)
Note: we are using a maximum length of 127 characters per line
Related https://github.com/adysonmaia/sb3-plus (I rediscovered it recently)
Hi @araffin, I have a little problem here:
Our class HybridPPO requires an action space made of a gym.spaces.Tuple containing a gym.spaces.MultiDiscrete and a gym.spaces.Box. In short speudocode: spaces.Tuple[spaces.MultiDiscrete, spaces.Box].
HybridPPO inherits from sb3's PPO, which passes supported_action_spaces to its super-class (OnPolicyAlgorithm) in a hard-coded way. Unfortunately spaces.Tuple is not included among the supported spaces, and this causes an error when creating a HybridPPO object.
Alternatives
Now, we have a couple of alternatives:
- Add
supported_action_spacesas an optional parameter ofPPO's constructor; - Make
HybridPPOinherit from OnPolicyAlgorithm instead ofPPOso we can passsupported_action_spacesas an argument.
My opinion
Alternative 1 is better.
It adds flexibility without overhead (the parameter is optional). Regardless of how the code in this PR evolves, this change to PPO's parameters would be barely noticeable to sb3's users.
Furthermore, if we need to make HybridPPO inherit from OnPolicyAlgorithm (as per alternative 2), this would require us to re-implement a lot of the code that is already present in PPO with minor changes (kind of duplicate code).
Action items
If you want to proceed with alternative 1, I'd be happy to open a PR in sb3 for it (including creating an issue to discuss the change beforehand, if necessary).
Let me know what you think so we can be unblocked 😄
Make HybridPPO inherit from OnPolicyAlgorithm
This is fine, I don't think there should be too many duplicated code because of that, no? (that's what we do already for the other variants of PPO)
Make HybridPPO inherit from OnPolicyAlgorithm
This is fine, I don't think there should be too many duplicated code because of that, no? (that's what we do already for the other variants of PPO)
Ok, done in the latest commit. You were right, it wasn't so much extra code in the end. Thanks for the quick reply 😄