stable-baselines3 icon indicating copy to clipboard operation
stable-baselines3 copied to clipboard

[Bug]: PPO doesn't work correctly with MultiDiscrete action spaces with "start" parameter

Open AlejandroCN7 opened this issue 2 years ago • 1 comments

🐛 Bug

Hello!

I am trying to run the PPO algorithm with one of the environments we have created in Sinergym.

The point is that I have defined a MultiDiscrete action space (which according to the documentation is compatible), but the actions performed do not take into account the "start" parameter of the space definition.

As can be seen in the Traceback, the last action variable should be an integer value between 25 and 35, but it takes the values from 0 to 10.

I do not include test code in order not to increase the complexity of the problem, since I am using Sinergym as I have commented. The problem is simpler, and it can be seen in Traceback.

Code example

No response

Relevant log output / Error message

[ENVIRONMENT] (WARNING) : Step: The action [1 0 0 1 1 1] is not correct for the Action Space MultiDiscrete([ 2  2  2  2  2 11], start=[ 0  0  0  0  0 25])
[ENVIRONMENT] (WARNING) : Step: The action [1 1 1 0 0 5] is not correct for the Action Space MultiDiscrete([ 2  2  2  2  2 11], start=[ 0  0  0  0  0 25])
[ENVIRONMENT] (WARNING) : Step: The action [0 0 1 1 0 9] is not correct for the Action Space MultiDiscrete([ 2  2  2  2  2 11], start=[ 0  0  0  0  0 25])
[ENVIRONMENT] (WARNING) : Step: The action [1 1 1 0 1 8] is not correct for the Action Space MultiDiscrete([ 2  2  2  2  2 11], start=[ 0  0  0  0  0 25])
[ENVIRONMENT] (WARNING) : Step: The action [0 1 0 1 1 8] is not correct for the Action Space MultiDiscrete([ 2  2  2  2  2 11], start=[ 0  0  0  0  0 25])
[ENVIRONMENT] (WARNING) : Step: The action [0 0 0 1 0 2] is not correct for the Action Space MultiDiscrete([ 2  2  2  2  2 11], start=[ 0  0  0  0  0 25])
[ENVIRONMENT] (WARNING) : Step: The action [0 0 1 1 0 3] is not correct for the Action Space MultiDiscrete([ 2  2  2  2  2 11], start=[ 0  0  0  0  0 25])
[ENVIRONMENT] (WARNING) : Step: The action [0 1 0 0 0 7] is not correct for the Action Space MultiDiscrete([ 2  2  2  2  2 11], start=[ 0  0  0  0  0 25])
[ENVIRONMENT] (WARNING) : Step: The action [0 1 0 0 1 3] is not correct for the Action Space MultiDiscrete([ 2  2  2  2  2 11], start=[ 0  0  0  0  0 25])
...

System Info

  • SB3 intalled by pip (stable-baselines3==2.0.0).
  • Python 3.10.6
  • torch==2.0.1
  • gymnasium==0.29.1

Checklist

  • [X] I have checked that there is no similar issue in the repo
  • [X] I have read the documentation
  • [X] I have provided a minimal and working example to reproduce the bug
  • [X] I have checked my env using the env checker
  • [X] I've used the markdown code blocks for both code and stack traces.

AlejandroCN7 avatar Oct 18 '23 10:10 AlejandroCN7

Probably similar to https://github.com/DLR-RM/stable-baselines3/issues/1295, we need to update the env checker

edit: correct issue is https://github.com/DLR-RM/stable-baselines3/issues/913#issuecomment-1129537155

araffin avatar Oct 18 '23 10:10 araffin