stable-baselines3 icon indicating copy to clipboard operation
stable-baselines3 copied to clipboard

[Question] Getting a single environment from a vectorized environment save file

Open roybogin opened this issue 3 years ago • 1 comments

Important Note: We do not do technical support, nor consulting and don't answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.

Question

I have trained an agent that on a SubprocVecEnv environment from a custom environment and saved it's data using model.save. Now I want to use that model to run one agent on a single copy the original environment (without a vector) but loading it seems to generate a vectorized environment. Is there a way of loading the save file to a single environment?

Additional context

Add any other context about the question here.

Checklist

  • [x] I have read the documentation (required)
  • [x] I have checked that there is no similar issue in the repo (required)

roybogin avatar Jul 31 '22 16:07 roybogin

Not sure to understand your question. Next time, please provide a piece of code to help us better understand your question. From the doc, and trying to match your description:

import gym

from stable_baselines3 import DQN
from stable_baselines3.common.vec_env import SubprocVecEnv


if __name__ == "__main__":
    # Create environment
    env = SubprocVecEnv(env_fns=[lambda: gym.make('CartPole-v0') for _ in range(2)])
    # Instantiate the agent
    model = DQN('MlpPolicy', env, verbose=1)
    # Train the agent
    model.learn(total_timesteps=20_000)
    # Save the agent
    model.save("dqn_carpole")

To load and run:

import gym

from stable_baselines3 import DQN

# Load the trained agent
model = DQN.load("dqn_carpole", gym.make("CartPole-v0"))
env = model.get_env()

obs = env.reset()
for i in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, rewards, dones, info = env.step(action)

qgallouedec avatar Jul 31 '22 17:07 qgallouedec

I have the same question as @roybogin above, which I can explain in more detail. In @qgallouedec's sample code (thanks!), env will be a vectorized environment of type stable_baselines3.common.vec_env.dummy_vec_env.DummyVecEnv. For a DummyVecEnv, knowing that the vectorized environment actually consists of a single env, we can "un-vectorize" it this way:

original_env = env.envs[0] 

But this is specific to DummyVecEnv and it wouldn't work with e.g. a SubprocVecEnv (there's no SubprocVecEnv.envs).

Is there a way to un-vectorize a "singleton" vectorized environment, similarly to the DummyVecEnv above, but that would work with any VecEnv?

stephane-caron avatar Sep 01 '23 13:09 stephane-caron

Hello, why would you want to do that? and what is wrong with a VecEnv that has only one env ?

araffin avatar Sep 01 '23 13:09 araffin

I'm not sure what @roybogin's original need was. In my case the friction is that I end up maintaining code with on the one hand the Gymnasium API (like this one):

    while True:
        action = some_policy(observation)
        observation, _, terminated, truncated, info = env.step(action)
        if terminated or truncated:
            observation, info = env.reset()
        some_function(observation, action)

And on the other hand the SB3 API (like that one):

    while True:
        actions = sb3_policy.predict(observations)
        observations, rewards, dones, infos = sb3_policy.env.step(actions)
        if dones[0]:
            observations = policy.env.reset()
        some_function(observations[0], actions[0])

It makes my maintain-O-meter beep :wink: Also the [0]'s in the SB3 snippet are redundant in the context of a single environment. (Apologies in advance if there is an obvious simplification in the SB3 API that I missed.)

stephane-caron avatar Sep 01 '23 13:09 stephane-caron

I see, did you know that SB3 predict work with gym env too?

edit: btw, vecenv resets auromatically

araffin avatar Sep 01 '23 13:09 araffin

I see, did you know that SB3 predict work with gym env too?

What I mean is that we autodetect the shape of the input and output the correct shape: https://github.com/DLR-RM/stable-baselines3/blob/84163b468c99538f2c98a3ebcc6124974ec631fd/stable_baselines3/common/policies.py#L362-L364

araffin avatar Sep 01 '23 14:09 araffin

Thank you @araffin :smiley: This solves the friction point entirely.

edit: btw, vecenv resets auromatically

Whoops, all the more reason to use a gym env and do the resets explicitly :sweat_smile: (In this use case resetting can do real-world things when the code runs on the robot.)

stephane-caron avatar Sep 01 '23 14:09 stephane-caron