stable-baselines3 icon indicating copy to clipboard operation
stable-baselines3 copied to clipboard

SubprocVecEnv Sets Out-of-Range Seeds for My Environments (ScenarioNet Enviroment)

Open chrisgao99 opened this issue 1 year ago • 7 comments
trafficstars

🐛 Bug

When using SubprocVecEnv from stable-baselines3,

env = make_vec_env(lambda: env_creator3(env_config), n_envs=n_envs, vec_env_cls=SubprocVecEnv)

the seeds are automatically set in a sequential manner starting from a base seed, e.g., 33247589, 33247590, etc. The relative code is here:

https://github.com/DLR-RM/stable-baselines3/blob/285e01f64aa8ba4bd15aa339c45876d56ed0c3b4/stable_baselines3/common/vec_env/subproc_vec_env.py#L46

However, my environment requires the seed to be within the range 0-9 as I only have 10 scenarios saved in a directory, and each seed corresponds to a specific scenario file.

One workable method to solve this is to use a wrapper for the env, and change the seed in the env reset function:

class MyWrapper(gym.Wrapper):
    def __init__(self, env):
        super().__init__(env)

    def reset(self, **kwargs):
        kwargs['seed'] = np.random.randint(0,10)
        obs,info = self.env.reset(**kwargs)
        return obs,info

But this is not very convenient for testing. If I want to test scenarios from seed 0 to 9 one by one, I can't directly use env.reset(seed). Instead, I have to modify the wrapper again.

So are there any ways to manually control the seed range for envs generated with SubprocVecEnv?

Code example

No response

Relevant log output / Error message

No response

System Info

No response

Checklist

  • [X] I have checked that there is no similar issue in the repo
  • [X] I have read the documentation
  • [X] I have provided a minimal and working example to reproduce the bug
  • [X] I have checked my env using the env checker
  • [X] I've used the markdown code blocks for both code and stack traces.

chrisgao99 avatar May 07 '24 01:05 chrisgao99

hello, i think there is a misconception between seed, used for pseudo random generator and scenarios.

araffin avatar May 07 '24 05:05 araffin

hello, i think there is a misconception between seed, used for pseudo random generator and scenarios.

Thanks. Do you mean the seed data[0] in this file

https://github.com/DLR-RM/stable-baselines3/blob/285e01f64aa8ba4bd15aa339c45876d56ed0c3b4/stable_baselines3/common/vec_env/subproc_vec_env.py#L46

are pseudo random generator?

But if I don't change the seed in my wrapper like this kwargs['seed'] = np.random.randint(0,10), the seed data[0] will actually be my scenario seed. And this will raise a seed out of range error.

So can I control the data[0] seed?

chrisgao99 avatar May 09 '24 19:05 chrisgao99

why not sample the scenario idx?

scenario_idx = np.random.randint(0,10)

qgallouedec avatar May 09 '24 19:05 qgallouedec

why not sample the scenario idx?

scenario_idx = np.random.randint(0,10)

May I ask where to put this line of code?

chrisgao99 avatar May 09 '24 19:05 chrisgao99

In the reset method of your env.

qgallouedec avatar May 10 '24 06:05 qgallouedec

In the reset method of your env.

Yeah, this also works. But It's the same with changing my env wrapper. If I add a sample scheme in the reset function, I will always need to change the scheme when I want to use other env seeds.

Maybe right now modifying reset function is the only method. But it would be better for stable baseline to have the parameter that I can specify how to sample env seed for training with VecEnv.

chrisgao99 avatar May 10 '24 13:05 chrisgao99

Maybe right now modifying reset function is the only method. But it would be better for stable baseline to have the parameter that I can specify how to sample env seed for training with VecEnv.

Again, I think you are confusing seed of pseudo random generator and options. With VecEnv, you can directly call method that will set parameters in the env (see doc and other issues) and you can also set the reset options.

import numpy as np

# This is a seed of the numpy default pseudo random generator
seed = 76732031632
np.random.seed(seed)
for _ in range(5):
    print(np.random.randint(0, 10))

# Seed again to obtain the same sequence
np.random.seed(seed)
for _ in range(5):
    print(np.random.randint(0, 10))

araffin avatar May 10 '24 13:05 araffin

I see what you mean. Thank you.

At the beginning of training, there's "seed" parameter in make_vec_env function, which is the initial seed for the random number generator. And if I don't assign a value to it, it will be a random big integer, which will cause a seed out of range error to my env. So I just need to set a smaller initial seed for the random number generator

Here is the code to reproduce it for other people to understand.

import gymnasium as gym
from stable_baselines3 import A2C
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import SubprocVecEnv

class SeedPrintWrapper(gym.Wrapper):
    def __init__(self, env):
        super().__init__(env)
    
    def reset(self, **kwargs):
        seed = kwargs.get('seed', None)
        print(f"Environment seed: {seed}")

        return self.env.reset(**kwargs)

def make_custom_env(env_id):
    def _init():
        env = gym.make(env_id)
        env = SeedPrintWrapper(env)
        return env
    return _init

if __name__ == '__main__':
 
    # Create vectorized environments
    vec_env = make_vec_env(make_custom_env("CartPole-v1"), n_envs=2, vec_env_cls=SubprocVecEnv)

    # Initialize the model with the vectorized environment
    model = A2C("MlpPolicy", vec_env, verbose=1)
    model.learn(total_timesteps=100)

outputs:

data[0] is:  4119632371
data[0] is:  4119632372
Environment seed: 4119632371
Environment seed: 4119632372
Environment seed: None
Environment seed: None
Environment seed: None
Environment seed: None
Environment seed: None
Environment seed: None

After I set the seed in make_vec_env,

vec_env = make_vec_env(make_custom_env("CartPole-v1"), n_envs=2, vec_env_cls=SubprocVecEnv,seed=0)

ouputs:

data[0] is:  0
Environment seed: 0
data[0] is:  1
Environment seed: 1
Environment seed: None
Environment seed: None
Environment seed: None
Environment seed: None

chrisgao99 avatar May 10 '24 15:05 chrisgao99