stable-baselines3 [Question] Multi Output Policy Support?

Question

Are multi output policies supported yet? I see that dictionary observations are supported per the docs, however I do not see anything out multi output policies...

Additional context

I am wanting to make a wrapper around PySC2 now that dictionary observations are supported, however multiple output policy support is still required.

Checklist

[X ] I have read the documentation (required)
[X ] I have checked that there is no similar issue in the repo (required)

Aug 02 '21 14:08 H-Park

This is a feature I think would nicely complement dictionary observations nicely. In the past we talked with @araffin about this, and the biggest issues are 1) what is the correct implementation of it and 2) what to do about support for off-policy algorithms (very different implementation. I think A2C and PPO could support multiple, independent action spaces, and this should work well.

@araffin Comments? Should this be a contrib thing if DQN/SAC/TD3 implementation is not trivial or doable? At least on A2C/PPO side, independent action spaces is a common way to approach this.

Aug 02 '21 16:08 Miffyli

I am wanting to make a wrapper around PySC2 now that dictionary observations are supported, however multiple output policy support is still required.

what type of multi output policy is required? (discrete/continuous or other?)

@araffin Comments? Should this be a contrib thing if DQN/SAC/TD3 implementation is not trivial or doable? At least on A2C/PPO side, independent action spaces is a common way to approach this.

I haven't much more comments than in https://github.com/DLR-RM/stable-baselines3/issues/349#issuecomment-800198204

At least on A2C/PPO side, independent action spaces is a common way to approach this.

ah, do you have some reference for that?

Aug 07 '21 16:08 araffin

ah, do you have some reference for that?

Not a solid one right now, but at least this paper suggests to start with independent spaces before trying to investigate if adding dependencies would help. The latter would be very task specific and hardly support-able in SB3, while independent spaces would be a very easy feat, comparably.

Aug 07 '21 23:08 Miffyli

what type of multi output policy is required? (discrete/continuous or other?)

PySC2 docs say it's a discrete, and a box (for x, y of move).

Now that I think about this, this can be done with a multidiscrete output space with PPO.

But this feature would be really awesome!

Aug 08 '21 16:08 H-Park

It seems that @adysonmaia implemented PPO with dict action space support here: https://github.com/adysonmaia/sb3-plus/blob/main/sb3_plus/mimo_ppo/ppo.py#L24

May 22 '23 09:05 araffin

It seems that @adysonmaia implemented PPO with dict action space support here: https://github.com/adysonmaia/sb3-plus/blob/main/sb3_plus/mimo_ppo/ppo.py#L24

Hi, I just started an implementation of PPO supporting dict action space for independent actions. At the moment, there isn't any documentation or validation tests yet. However, an "official" support of this feature in either SB3 or SB3-Contrib projects would be really interesting.

May 22 '23 12:05 adysonmaia

@adysonmaia are you planning on adding this feature to sb3-contrib or publishing sb3-plus to install with pip? I am very insterested on this, so please tell me if it cold be soon or not. Thanks in advance

Jul 04 '23 11:07 EloyAnguiano

Hi @EloyAnguiano, I intend to push the sb3-plus project as a pip repository when its code is more stable and tested. For now, it's possible to install it via pip using the GitHub url. For example: pip install git+https://github.com/adysonmaia/sb3-plus#egg=sb3-plus

Aug 16 '23 18:08 adysonmaia

stable-baselines3 stable-baselines3 copied to clipboard

[Question] Multi Output Policy Support?

Question

Additional context

Checklist

stable-baselines3
stable-baselines3 copied to clipboard