stable-baselines3 Fix missing seed / option args in dummy and subproc vec

Description

Passed the appropriate elements from self._seeds and self._options to a "done" env that calls its reset function, in DummyVecEnv.step_wait
Added missing options argument in the reset functions of TimeDelayEnv and CustomSubClassedSpaceEnv
Modified SubprocVecEnv to send options and seeds arguments as in (1) in step_async.
Modified the _worker function's "step" case, to receive and pass these arguments into the env's reset, as in (1).

Motivation and Context

Closes #1790

[x] I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[x] Documentation (update in the documentation)

Checklist

[x] I've read the CONTRIBUTION guide (required)
[x] I have updated the changelog accordingly (required).
[x] My change requires a change to the documentation.
[x] I have updated the tests accordingly (required for a bug fix or a new feature).
[x] I have updated the documentation accordingly.
[ ] I have opened an associated PR on the SB3-Contrib repository (if necessary)
[ ] I have opened an associated PR on the RL-Zoo3 repository (if necessary)
[x] I have reformatted the code using make format (required)
[x] I have checked the codestyle using make check-codestyle and make lint (required)
[x] I have ensured make pytest and make type both pass. (required)
[x] I have checked that the documentation builds using make doc (required)

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

Jan 11 '24 17:01 npit

Hello @npit thanks for the PR and sorry for the delay.

By looking at it, I think it would be better to document the fact that we don't allow to pass options if reset is not called right after. I'm afraid that dealing with partially reset env is more tricky.

To explain what I mean: Imagine that you have two envs and you passed options=[{"a": 1}, {"a": 2}] (so you want to pass a=1 for 1st env and a=2 for second env.

Let's say the first env reset, then you can pass a=1, but what shall you discard it from the options list? Also, what should happen if afterward the user options=[{"a": 3}, {"a": 3}] before the second env is reset? In that case, the first env would have received the correct a=1 but the a=2 would have been silently discarded by the user passing the new options.

Enforcing this limitation would not limit the user though. The user can still access setter using the VevEnv api (see https://github.com/DLR-RM/stable-baselines3/pull/1789)

What do you think @qgallouedec ?

PS: options should not be required to work with SB3, so you should in theory make it opt-in (see what I did for the reset())

Jan 18 '24 15:01 araffin

By looking at it, I think it would be better to document the fact that we don't allow to pass options if reset is not called right after. I'm afraid that dealing with partially reset env is more tricky.

As I see it, providing functionality of parameterizing resets but having it apply on a subset of resets is quite confusing. What's more, the implicit resets are currently outside the user's control: the vec_env.set_options() utility is to be called before the user loses control by invoking vec_env.step, evaluate_policy or whatever: at that point, the user may as well modify the environment directly by a bunch of vec_env.set_attr, since they currently have access to the vec_env. On the other hand, the implicit resets on episode terminations are out of bounds to the user once vec_env starts stepping.

What is the alternative if you want to control implicit resets? Should we make custom functionality instead (e.g. constructor args and dedicated logic)? Doesn't that route kind of defeat the purpose of having arguments to reset and the set_options util?

Let's say the first env reset, then you can pass a=1, but what shall you discard it from the options list? Also, what should happen if afterward the user options=[{"a": 3}, {"a": 3}] before the second env is reset? In that case, the first env would have received the correct a=1 but the a=2 would have been silently discarded by the user passing the new options.

Currently, options persist until the user explicitly invokes vec_env.reset as per the docs. If #1790 is indeed an issue to be fixed (i.e. if we do want implicit resets to consider passed option/seed parameters), then it would be up to the user to build the reset logic and arguments to set_options to fit whatever they want to do, if their use case is complicated enough for the examples listed above to be undesirable.

E.g. if reset options are a function of what previous options were consumed in the previous step, users can perhaps track / count the number of episodes per env and adjust. If they want different options per implicit reset, maybe pass an options list and cycle elements from it in reset. Etc.

Does that make sense, or am I missing something?

Jan 18 '24 22:01 npit

the user may as well modify the environment directly by a bunch of vec_env.set_attr, since they currently have access to the vec_env.

yes, this is the recommended way (having setters, see the example I linked which is in the doc now: https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#modifying-vectorized-environments-attributes).

Doesn't that route kind of defeat the purpose of having arguments to reset and the set_options util?

this API (passing options at reset) was not a SB3 decision but a gymnasium :/ (I was actually against it) and it was designed with single env in mind. We only provided some helper for people that wanted to use it.

What I would do in any way is update the doc, add a warning and link to the current discussion.

I will wait for the opinion of other maintainers to decide if this is something to be fixed or if it is a behavior that is ok but must be documented.

Jan 20 '24 13:01 araffin

I have the impression that passing an option to reset is a very little-used feature. I think it's important to support it though.

However, supporting very precise usage such as a list of options seems to me very complicated, both in terms of implementation and maintenance. And it seems like a bottomless pit: What if the user wants to cycle through a common list of options for all envs? If we support both an option in the form of an object and an utterable of objects, what do we do if the option type expected by the environment is an iterable (do we need to fetch the signature, or do a try;expect?)? In short, too complicated for nothing.

I approve of this PR as it stands, and I think we should limit the support to this. For more advanced uses of option in reset, I think the user will have to customize their environment.

Apr 17 '24 09:04 qgallouedec

If the user absolutely need option to be a cycle, they can still do this:

import gymnasium as gym

class MyWrapper(gym.Wrapper):
    def __init__(self, env, option_cycle):
        super().__init__(env)
        self.option_cycle = option_cycle

    def reset(self, seed=None, options=None):
        return self.env.reset(seed=seed, options=next(self.option_cycle))

Apr 17 '24 09:04 qgallouedec

stable-baselines3
stable-baselines3 copied to clipboard

Fix missing seed / option args in dummy and subproc vec_env resets during stepping

Description

Motivation and Context

Types of changes

Checklist

stable-baselines3 stable-baselines3 copied to clipboard

Fix missing seed / option args in dummy and subproc vec_env resets during stepping

Description

Motivation and Context

Types of changes

Checklist

stable-baselines3
stable-baselines3 copied to clipboard