[Bug]: PPO doesn't work correctly with MultiDiscrete action spaces with "start" parameter
🐛 Bug
Hello!
I am trying to run the PPO algorithm with one of the environments we have created in Sinergym.
The point is that I have defined a MultiDiscrete action space (which according to the documentation is compatible), but the actions performed do not take into account the "start" parameter of the space definition.
As can be seen in the Traceback, the last action variable should be an integer value between 25 and 35, but it takes the values from 0 to 10.
I do not include test code in order not to increase the complexity of the problem, since I am using Sinergym as I have commented. The problem is simpler, and it can be seen in Traceback.
Code example
No response
Relevant log output / Error message
[ENVIRONMENT] (WARNING) : Step: The action [1 0 0 1 1 1] is not correct for the Action Space MultiDiscrete([ 2 2 2 2 2 11], start=[ 0 0 0 0 0 25])
[ENVIRONMENT] (WARNING) : Step: The action [1 1 1 0 0 5] is not correct for the Action Space MultiDiscrete([ 2 2 2 2 2 11], start=[ 0 0 0 0 0 25])
[ENVIRONMENT] (WARNING) : Step: The action [0 0 1 1 0 9] is not correct for the Action Space MultiDiscrete([ 2 2 2 2 2 11], start=[ 0 0 0 0 0 25])
[ENVIRONMENT] (WARNING) : Step: The action [1 1 1 0 1 8] is not correct for the Action Space MultiDiscrete([ 2 2 2 2 2 11], start=[ 0 0 0 0 0 25])
[ENVIRONMENT] (WARNING) : Step: The action [0 1 0 1 1 8] is not correct for the Action Space MultiDiscrete([ 2 2 2 2 2 11], start=[ 0 0 0 0 0 25])
[ENVIRONMENT] (WARNING) : Step: The action [0 0 0 1 0 2] is not correct for the Action Space MultiDiscrete([ 2 2 2 2 2 11], start=[ 0 0 0 0 0 25])
[ENVIRONMENT] (WARNING) : Step: The action [0 0 1 1 0 3] is not correct for the Action Space MultiDiscrete([ 2 2 2 2 2 11], start=[ 0 0 0 0 0 25])
[ENVIRONMENT] (WARNING) : Step: The action [0 1 0 0 0 7] is not correct for the Action Space MultiDiscrete([ 2 2 2 2 2 11], start=[ 0 0 0 0 0 25])
[ENVIRONMENT] (WARNING) : Step: The action [0 1 0 0 1 3] is not correct for the Action Space MultiDiscrete([ 2 2 2 2 2 11], start=[ 0 0 0 0 0 25])
...
System Info
- SB3 intalled by pip (stable-baselines3==2.0.0).
- Python 3.10.6
- torch==2.0.1
- gymnasium==0.29.1
Checklist
- [X] I have checked that there is no similar issue in the repo
- [X] I have read the documentation
- [X] I have provided a minimal and working example to reproduce the bug
- [X] I have checked my env using the env checker
- [X] I've used the markdown code blocks for both code and stack traces.
Probably similar to https://github.com/DLR-RM/stable-baselines3/issues/1295, we need to update the env checker
edit: correct issue is https://github.com/DLR-RM/stable-baselines3/issues/913#issuecomment-1129537155