Add Gym 0.25 support
See comment https://github.com/DLR-RM/stable-baselines3/pull/780#issuecomment-1164774401 to use this PR
Description
Gym 0.24 has been released and with it breaking changes. The objective of this PR is to fix all the failing tests.
Missing:
- [x] update changelog
- [x] update notebooks and check it works with gym 0.24 (https://github.com/Stable-Baselines-Team/rl-colab-notebooks/pull/9)
- [ ] update RL Zoo and check it works with gym 0.24 (https://github.com/DLR-RM/rl-baselines3-zoo/pull/256, not ready, gym needs to be fixed first)
- [x] update tutorials (https://github.com/araffin/rl-tutorial-jnrr19/pull/13)
- [x] prepare docker image
- [x] push docker images
- [x] update SB3 contrib (https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/71)
Motivation and Context
Types of changes
- [X] Bug fix (non-breaking change which fixes an issue)
closes #840 #871 closes #271
Checklist:
- [X] I've read the CONTRIBUTION guide (required)
- [ ] I have updated the changelog accordingly (required).
- [ ] My change requires a change to the documentation.
- [ ] I have updated the tests accordingly (required for a bug fix or a new feature).
- [ ] I have updated the documentation accordingly.
- [X] I have reformatted the code using
make format(required) - [X] I have checked the codestyle using
make check-codestyleandmake lint(required) - [ ] I have ensured
make pytestandmake typeboth pass. (required) - [ ] I have checked that the documentation builds using
make doc(required)
Status
- There's still 28 failing tests when running locally, need to investigate further
it seems that the failures come from pygame not being installed somehow, probably an issue from gym...
it seems that the failures come from
pygamenot being installed somehow, probably an issue from gym...
Only a subset of the failures, those are fixed now.
@RedTachyon I'm currently investigating a bug which may be related to https://github.com/openai/gym/pull/2422.
Here's a stack trace of the type of errors:
tests/test_save_load.py:216:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
stable_baselines3/dqn/dqn.py:258: in learn
return super(DQN, self).learn(
stable_baselines3/common/off_policy_algorithm.py:354: in learn
rollout = self.collect_rollouts(
stable_baselines3/common/off_policy_algorithm.py:584: in collect_rollouts
actions, buffer_actions = self._sample_action(learning_starts, action_noise, env.num_envs)
stable_baselines3/common/off_policy_algorithm.py:415: in _sample_action
unscaled_action, _ = self.predict(self._last_obs, deterministic=False)
stable_baselines3/dqn/dqn.py:238: in predict
action = np.array([self.action_space.sample() for _ in range(n_batch)])
stable_baselines3/dqn/dqn.py:238: in <listcomp>
action = np.array([self.action_space.sample() for _ in range(n_batch)])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = Discrete(10)
def sample(self) -> int:
> return self.start + self.np_random.randint(self.n)
E AttributeError: 'numpy.random._generator.Generator' object has no attribute 'randint'
I've been digging into the code and the problem seems to be that this line
def sample(self) -> int:
return self.start + self.np_random.randint(self.n)
is being called with self.np_random of type numpy.random._generator.Generator, instead of gym.utils.seeding.RandomNumberGenerator. The former doesn't have a method randint so the code fails.
If it's any help, this is only happening after loading a model, so the problem may be related to it (I haven't checked much on that front yet, wanted to rule out something obvious with the changes on seeding first). Here's an example test were it fails https://github.com/DLR-RM/stable-baselines3/blob/58a98060f9ec618a740f121c5904ee845fd02f15/tests/test_save_load.py#L216
It seems discrete.py in Gym assumes np_random will always be of some type but doesn't enforce it explicitly? Or there's something funny happening with the loading...
I think I found the issue - the custom RNG class inherits from the numpy Generator for compatibility purposes, but when it gets pickled/unpickled, it defaults to the numpy behavior which creates a new Generator.
I think I see two main ways to solve it - either in __setstate__ of Space make sure that a RNG is set instead of a Generator; or in the RNG code do some custom code to pickle/unpickle the proper class
I think I found the issue - the custom RNG class inherits from the numpy Generator for compatibility purposes, but when it gets pickled/unpickled, it defaults to the numpy behavior which creates a new Generator.
I think I see two main ways to solve it - either in
__setstate__of Space make sure that a RNG is set instead of a Generator; or in the RNG code do some custom code to pickle/unpickle the proper class
I see, thanks for checking. I wonder what this means for SB3 adopting Gym 0.22, since the problem seems to be on Gym's side. I'm guessing the fix (whenever it comes) won't be available for SB3 until the next release by Gym?
won't be available for SB3 until the next release by Gym?
https://github.com/openai/gym/issues/2640 must be fixed first too
We're going to do a 0.22.1 release in the new future that will fix all of these things, some of the relevant fixes here have already been merged
Update on this:
- Confirmed that https://github.com/openai/gym/pull/2639 solves the bug I mentioned before, so in the next Gym release those tests will pass.
- There's three remaining tests failing, two separate bugs
- Mean reward is below desired threshold.
FAILED tests/test_identity.py::test_discrete[env2-A2C] - AssertionError: Mean reward below threshold: 88...
FAILED tests/test_identity.py::test_discrete[env2-PPO] - AssertionError: Mean reward below threshold: 86...
Not sure exactly how gym's updates are causing this lower mean reward, need to check this further
- Warning not raised in HER test
FAILED tests/test_her.py::test_save_load_replay_buffer[True-False] - assert 0 == 1
Caused because the trajectory truncation is not raising a warning in these lines:
https://github.com/DLR-RM/stable-baselines3/blob/cdaa9ab418aec18f41c7e8e12e0ad28f238553eb/tests/test_her.py#L258-L261
EDIT: CI also shows about 3 tests failing with EOFError, and I also observed those locally, but they seem to be also solved when using gym master (tested locally).
Regarding the "mean reward below threshold" problem, after some further investigations the root cause is the change in seeding behaviour in gym.
SB3 sets seeds here: https://github.com/DLR-RM/stable-baselines3/blob/e88eb1c9ca98650f802409e5845e952c39be9e76/stable_baselines3/common/base_class.py#L576
Before gym 0.22.0, the default seed() method was not actually setting any seed [source]. In gym 0.22 the default seed() does set the seed [source].
The consequence of it is, for instance, that the result of https://github.com/DLR-RM/stable-baselines3/blob/e88eb1c9ca98650f802409e5845e952c39be9e76/stable_baselines3/common/evaluation.py#L82 is not consistent between gym==0.21.0 and gym >= 0.22.0. The solution I say makes the most sense is to simply adjust the threshold value in the test.
Before gym 0.22.0, the default seed() method was not actually setting any seed [source]. In gym 0.22 the default seed() does set the seed [source].
I see...
is not consistent between gym==0.21.0 and gym >= 0.22.0.
is there any way to make it consistent?
The solution I say makes the most sense is to simply adjust the threshold value in the test.
The whole point of those performance tests is to detect any change that may injure performance/change results. (they fail easily as soon as any minor change is made) So we should not change the threshold but rather fix the underlying change/issue, or in the current case, I would prefer changing the seed if we cannot have consistent behavior with previous version.
btw, are all those warnings (see below) due to gym only?
gym/utils/seeding.py:47: 1 warning
tests/test_callbacks.py: 4153 warnings
tests/test_cnn.py: 377 warnings
tests/test_custom_policy.py: 106 warnings
tests/test_deterministic.py: 238 warnings
tests/test_dict_env.py: 14116 warnings
tests/test_envs.py: 2 warnings
tests/test_her.py: 1566 warnings
tests/test_identity.py: 18153 warnings
tests/test_logger.py: 300 warnings
tests/test_monitor.py: 3000 warnings
tests/test_predict.py: 109 warnings
tests/test_run.py: 222 warnings
tests/test_save_load.py: 6597 warnings
tests/test_spaces.py: 1110 warnings
tests/test_train_eval_mode.py: 31 warnings
tests/test_utils.py: 12 warnings
tests/test_vec_envs.py: 216 warnings
tests/test_vec_monitor.py: 4000 warnings
tests/test_vec_normalize.py: 6547 warnings
/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/gym/utils/seeding.py:47: DeprecationWarning: WARN: Function `rng.randint(low, [high, size, dtype])` is marked as deprecated and will be removed in the future. Please use `rng.integers(low, [high, size, dtype])` instead.
same for
tests/test_envs.py::test_non_default_action_spaces[new_action_space4]
/home/runner/work/stable-baselines3/stable-baselines3/stable_baselines3/common/env_checker.py:272: UserWarning: We recommend you to use a symmetric and normalized Box action space (range=[-1, 1]) cf https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html
warnings.warn(
tests/test_utils.py: 16 warnings
/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/gym/utils/seeding.py:138: DeprecationWarning: WARN: Function `hash_seed(seed, max_bytes)` is marked as deprecated and will be removed in the future.
deprecation(
tests/test_utils.py: 16 warnings
/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/gym/utils/seeding.py:175: DeprecationWarning: WARN: Function `_bigint_from_bytes(bytes)` is marked as deprecated and will be removed in the future.
deprecation(
?
Similarly, the HER test failure was also caused by the change in RNG. The failing test required that after training for 200 steps, the env had to be left in the middle of a trajectory. After the change in gym, the RNG gods decided that after 200 steps the env had just finished an episode, i.e., self.current_idx == 0 in
https://github.com/DLR-RM/stable-baselines3/blob/e88eb1c9ca98650f802409e5845e952c39be9e76/stable_baselines3/her/her_replay_buffer.py#L523
==> warning is never raised ==> test assertion fails.
A simple change in the seed fixes the test.
@araffin I would have to check the warnings, but now every time env.seed is called will raise a warning.
I reverted the threshold back to 90 and changing the seed did the trick.
but now every time env.seed is called will raise a warning.
Could we make the VecEnv seed method backward compatible? In the sense that if seed is accepted in reset() it calls reset(seed=seed), otherwise it call seed(seed) when venv.seed(seed) is called.
This would prevent most warnings normally.
@carlosluis
Regarding the "mean reward below threshold" problem, after some further investigations the root cause is the change in seeding behaviour in gym.
Can you elaborate on how you got to this conclusion? Something seems very weird here, since this would imply that the test used to be stochastic (and happened to always pass), and now is deterministic, but the seed happens to fail. What's more, the change you mention affects env.np_random, which from what I can tell isn't used in the actual test
Further investigations on these:
@RedTachyon is right, the determinism in the test (both using gym v0.21.0 and current master) actually comes from setting the seed on the action space rather than the environment, as done in this line: https://github.com/DLR-RM/stable-baselines3/blob/e88eb1c9ca98650f802409e5845e952c39be9e76/stable_baselines3/common/base_class.py#L574
So after further inspection I have reason to believe the difference in test results between using gym 0.21.0 and gym>=0.22.0 comes from a change in behaviour of the np_random method. Compare v0.21.0 and the current master.
The evidence I have is that immediately after setting the action space seed in the line above, we get a different output on the first call function to action_space.sample() here:
https://github.com/DLR-RM/stable-baselines3/blob/e88eb1c9ca98650f802409e5845e952c39be9e76/stable_baselines3/common/envs/identity_env.py#L49
For example, with seed=2 I get:
- v0.21.0 results in
array([0, 0, 1, 0], dtype=int8) - master results in
array([1, 0, 0, 1], dtype=int8)
The difference is probably that the underlying RNG mechanism changed somewhere between releases, so the effect of setting an old random seed won't be the same.
I'd be super uncomfy having a test that depends so heavily on the random seed, but if that's just to check for consistency, searching for a random seed that works again would probably be good enough.
@carlosluis i think there was a misunderstanding with my remark here: https://github.com/DLR-RM/stable-baselines3/pull/780#issuecomment-1062235112
please do not change VecEnv signature but make VecEnv seed() method forward compatible (so it supports all gym versions).
@carlosluis i think there was a misunderstanding with my remark here: #780 (comment)
please do not change
VecEnvsignature but makeVecEnvseed()method forward compatible (so it supports all gym versions).
@araffin I see, yes I understood something different. AFAIK, to make VecEnv seed() forward compatible we should internally call reset(seed=seed) for every env, similar to how it's now done in DummyVecEnv.
def seed(self, seed: Optional[int] = None) -> List[Union[None, int]]:
seeds = list()
for idx, env in enumerate(self.envs):
seeds.append(env.reset(seed=seed + idx))
return seeds
but I'm not sure how to do that from the base class VecEnv, it doesn't have access to the individual environments right?
AFAIK, to make VecEnv seed() forward compatible we should internally call reset(seed=seed) for every env, similar to how it's now done in DummyVecEnv.
yes
but I'm not sure how to do that from the base class VecEnv
you don't do it in the base class, but rather in DummyVecEnv and in SubprocVecEnv (in the worker) and that should do the trick ;)
AFAIK, to make VecEnv seed() forward compatible we should internally call reset(seed=seed) for every env, similar to how it's now done in DummyVecEnv.
yes
but I'm not sure how to do that from the base class VecEnv
you don't do it in the base class, but rather in
DummyVecEnvand inSubprocVecEnv(in the worker) and that should do the trick ;)
Got it! I already have that mostly, I will remove the signature change in the base class and also make the SubprocVecEnv worker change.
@araffin As I was going through the commit diff to document the changes, I realized that maybe removing the GoalEnv class is incompatible with HER? If that's the case, then we would have to do as you said before and bring the class from https://github.com/Farama-Foundation/Gym-Robotics
In the docs for HER we have:
.. warning::
HER requires the environment to inherits from `gym.GoalEnv <https://github.com/openai/gym/blob/3394e245727c1ae6851b504a50ba77c73cd4c65b/gym/core.py#L160>`_
I don't find where specifically such a warning is raised, but there's definitely a codepath that expects to find the compute_reward() method enforced by the GoalEnv interface.
https://github.com/DLR-RM/stable-baselines3/blob/30772aa9f53a4cf61571ee90046cdc454c1b11d7/stable_baselines3/her/her_replay_buffer.py#L346-L362
This also impacts the HER colab example in https://github.com/DLR-RM/stable-baselines3/blob/master/docs/guide/examples.rst
Lastly, the highway env repo went through a similar process to start using GoalEnv from gym robotics but ultimately they dropped the dependency and copied the class instead, see https://github.com/eleurent/highway-env/commit/f8a4c633012ee004786346908e3f9b841edd64aa. Maybe we need to keep that in mind when discussing what to do about this issue.
I don't find where specifically such a warning is raised, but there's definitely a codepath that expects to find the compute_reward() method enforced by the GoalEnv interface.
well, the two features that we use are:
- we expect a dict obs with three keys
observation,achieved_goalanddesired_goal - we expect a vectorized implementation of
compute_reward()
I don't find where specifically such a warning is raised, but there's definitely a codepath that expects to find the compute_reward() method enforced by the GoalEnv interface.
well, the two features that we use are:
* we expect a dict obs with three keys `observation`, `achieved_goal` and `desired_goal` * we expect a vectorized implementation of `compute_reward()`
I wanted to check if there's something to be done in SB3 regarding the removal of GoalEnv from gym, in which case we should probably address it in this PR.
I wanted to check if there's something to be done in SB3 regarding the removal of GoalEnv from gym, in which case we should probably address it in this PR.
Those checks are now missing: https://github.com/openai/gym/blob/v0.21.0/gym/core.py#L182 and need to be added to the env checker at least (maybe somewhere else if it's easy to do).
And the doc must be updated to explain what is expected for HER replay buffer (because we cannot refer to GoalEnv interface anymore).
EDIT: before I forget, I will probably add backward compat for render modes in the vec video recorder.
@araffin the ALE related warnings about seeding.hash_seed and rng.randint are now fixed in ale-py==0.7.5.
@araffin I would like to help getting this PR merged. Are you or anyone else working on the missing points? If not, could I work on PRs to the connected repos (zoo, notebooks etc.)?
@araffin I would like to help getting this PR merged. Are you or anyone else working on the missing points? If not, could I work on PRs to the connected repos (zoo, notebooks etc.)?
thanks for the help =) currently no one is working on the missing points, please go ahead!
@araffin regarding the checklist,
- Tutorials, Notebooks - Created PRs.
- Zoo - Tests are working with sb3 at this fork, and sb3-contrib at current master (however sb3 contrib tests fail due to using gym.GoalEnv, but that does not affect zoo, maybe I can separately work on fixing that). Only need to change gym version in requirements and changelog which can be done along with next SB3 release.
I think that's all I can do outside this PR. Please let me know if this is okay.
2. Zoo - Tests are working with sb3 at this fork,
what test did you run for the zoo?
by running make pytest with this branch, I already get at least one test failing. (same for make check-trained-agents)
EDIT: the issue seems to come from dependencies that need to be updated
what test did you run for the zoo?
I had run make pytest and everything had passed for me. I pulled recent commits and tried again now, everything was still passing.
I had missed make check-trained-agents. I ran this now. This had errors due to - PandaEnvs (because of removal of gym.GoalEnv and panda-gym is not updated to reflect this). I'll create a PR there.