stable-baselines3-contrib Hybrid PPO

Implementation of Hybrid PPO

Description

Closes #202 (Description will follow)

Context

[x] I have raised an issue to propose this change (required) #202

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Documentation (update in the documentation)

Checklist:

[x] I've read the CONTRIBUTION guide (required)
[ ] The functionality/performance matches that of the source (required for new training algorithms or training-related features).
[ ] I have updated the tests accordingly (required for a bug fix or a new feature).
[ ] I have included an example of using the feature (required for new features).
[ ] I have included baseline results (required for new training algorithms or training-related features).
[ ] I have updated the documentation accordingly.
[ ] I have updated the changelog accordingly (required).
[ ] I have reformatted the code using make format (required)
[ ] I have checked the codestyle using make check-codestyle and make lint (required)
[ ] I have ensured make pytest and make type both pass. (required)

Note: we are using a maximum length of 127 characters per line

Jul 06 '25 14:07 AlexPasqua

Related https://github.com/adysonmaia/sb3-plus (I rediscovered it recently)

Sep 26 '25 11:09 araffin

Hi @araffin, I have a little problem here:

Our class HybridPPO requires an action space made of a gym.spaces.Tuple containing a gym.spaces.MultiDiscrete and a gym.spaces.Box. In short speudocode: spaces.Tuple[spaces.MultiDiscrete, spaces.Box].

HybridPPO inherits from sb3's PPO, which passes supported_action_spaces to its super-class (OnPolicyAlgorithm) in a hard-coded way. Unfortunately spaces.Tuple is not included among the supported spaces, and this causes an error when creating a HybridPPO object.

Alternatives

Now, we have a couple of alternatives:

Add supported_action_spaces as an optional parameter of PPO's constructor;
Make HybridPPO inherit from OnPolicyAlgorithm instead of PPO so we can pass supported_action_spaces as an argument.

My opinion

Alternative 1 is better.

It adds flexibility without overhead (the parameter is optional). Regardless of how the code in this PR evolves, this change to PPO's parameters would be barely noticeable to sb3's users. Furthermore, if we need to make HybridPPO inherit from OnPolicyAlgorithm (as per alternative 2), this would require us to re-implement a lot of the code that is already present in PPO with minor changes (kind of duplicate code).

Action items

If you want to proceed with alternative 1, I'd be happy to open a PR in sb3 for it (including creating an issue to discuss the change beforehand, if necessary).

Let me know what you think so we can be unblocked 😄

Nov 05 '25 10:11 AlexPasqua

Make HybridPPO inherit from OnPolicyAlgorithm

This is fine, I don't think there should be too many duplicated code because of that, no? (that's what we do already for the other variants of PPO)

Nov 06 '25 15:11 araffin

Make HybridPPO inherit from OnPolicyAlgorithm

This is fine, I don't think there should be too many duplicated code because of that, no? (that's what we do already for the other variants of PPO)

Ok, done in the latest commit. You were right, it wasn't so much extra code in the end. Thanks for the quick reply 😄

Nov 09 '25 11:11 AlexPasqua