stable-baselines3 icon indicating copy to clipboard operation
stable-baselines3 copied to clipboard

[Question] Multi Output Policy Support?

Open H-Park opened this issue 3 years ago • 8 comments

Question

Are multi output policies supported yet? I see that dictionary observations are supported per the docs, however I do not see anything out multi output policies...

Additional context

I am wanting to make a wrapper around PySC2 now that dictionary observations are supported, however multiple output policy support is still required.

Checklist

  • [X ] I have read the documentation (required)
  • [X ] I have checked that there is no similar issue in the repo (required)

H-Park avatar Aug 02 '21 14:08 H-Park

This is a feature I think would nicely complement dictionary observations nicely. In the past we talked with @araffin about this, and the biggest issues are 1) what is the correct implementation of it and 2) what to do about support for off-policy algorithms (very different implementation. I think A2C and PPO could support multiple, independent action spaces, and this should work well.

@araffin Comments? Should this be a contrib thing if DQN/SAC/TD3 implementation is not trivial or doable? At least on A2C/PPO side, independent action spaces is a common way to approach this.

Miffyli avatar Aug 02 '21 16:08 Miffyli

I am wanting to make a wrapper around PySC2 now that dictionary observations are supported, however multiple output policy support is still required.

what type of multi output policy is required? (discrete/continuous or other?)

@araffin Comments? Should this be a contrib thing if DQN/SAC/TD3 implementation is not trivial or doable? At least on A2C/PPO side, independent action spaces is a common way to approach this.

I haven't much more comments than in https://github.com/DLR-RM/stable-baselines3/issues/349#issuecomment-800198204

At least on A2C/PPO side, independent action spaces is a common way to approach this.

ah, do you have some reference for that?

araffin avatar Aug 07 '21 16:08 araffin

ah, do you have some reference for that?

Not a solid one right now, but at least this paper suggests to start with independent spaces before trying to investigate if adding dependencies would help. The latter would be very task specific and hardly support-able in SB3, while independent spaces would be a very easy feat, comparably.

Miffyli avatar Aug 07 '21 23:08 Miffyli

what type of multi output policy is required? (discrete/continuous or other?)

PySC2 docs say it's a discrete, and a box (for x, y of move).

Now that I think about this, this can be done with a multidiscrete output space with PPO.

But this feature would be really awesome!

H-Park avatar Aug 08 '21 16:08 H-Park

It seems that @adysonmaia implemented PPO with dict action space support here: https://github.com/adysonmaia/sb3-plus/blob/main/sb3_plus/mimo_ppo/ppo.py#L24

araffin avatar May 22 '23 09:05 araffin

It seems that @adysonmaia implemented PPO with dict action space support here: https://github.com/adysonmaia/sb3-plus/blob/main/sb3_plus/mimo_ppo/ppo.py#L24

Hi, I just started an implementation of PPO supporting dict action space for independent actions. At the moment, there isn't any documentation or validation tests yet. However, an "official" support of this feature in either SB3 or SB3-Contrib projects would be really interesting.

adysonmaia avatar May 22 '23 12:05 adysonmaia

@adysonmaia are you planning on adding this feature to sb3-contrib or publishing sb3-plus to install with pip? I am very insterested on this, so please tell me if it cold be soon or not. Thanks in advance

EloyAnguiano avatar Jul 04 '23 11:07 EloyAnguiano

Hi @EloyAnguiano, I intend to push the sb3-plus project as a pip repository when its code is more stable and tested. For now, it's possible to install it via pip using the GitHub url. For example: pip install git+https://github.com/adysonmaia/sb3-plus#egg=sb3-plus

adysonmaia avatar Aug 16 '23 18:08 adysonmaia