[Question] Getting a single environment from a vectorized environment save file
Important Note: We do not do technical support, nor consulting and don't answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.
Question
I have trained an agent that on a SubprocVecEnv environment from a custom environment and saved it's data using model.save. Now I want to use that model to run one agent on a single copy the original environment (without a vector) but loading it seems to generate a vectorized environment. Is there a way of loading the save file to a single environment?
Additional context
Add any other context about the question here.
Checklist
- [x] I have read the documentation (required)
- [x] I have checked that there is no similar issue in the repo (required)
Not sure to understand your question. Next time, please provide a piece of code to help us better understand your question. From the doc, and trying to match your description:
import gym
from stable_baselines3 import DQN
from stable_baselines3.common.vec_env import SubprocVecEnv
if __name__ == "__main__":
# Create environment
env = SubprocVecEnv(env_fns=[lambda: gym.make('CartPole-v0') for _ in range(2)])
# Instantiate the agent
model = DQN('MlpPolicy', env, verbose=1)
# Train the agent
model.learn(total_timesteps=20_000)
# Save the agent
model.save("dqn_carpole")
To load and run:
import gym
from stable_baselines3 import DQN
# Load the trained agent
model = DQN.load("dqn_carpole", gym.make("CartPole-v0"))
env = model.get_env()
obs = env.reset()
for i in range(1000):
action, _states = model.predict(obs, deterministic=True)
obs, rewards, dones, info = env.step(action)
I have the same question as @roybogin above, which I can explain in more detail. In @qgallouedec's sample code (thanks!), env will be a vectorized environment of type stable_baselines3.common.vec_env.dummy_vec_env.DummyVecEnv. For a DummyVecEnv, knowing that the vectorized environment actually consists of a single env, we can "un-vectorize" it this way:
original_env = env.envs[0]
But this is specific to DummyVecEnv and it wouldn't work with e.g. a SubprocVecEnv (there's no SubprocVecEnv.envs).
Is there a way to un-vectorize a "singleton" vectorized environment, similarly to the DummyVecEnv above, but that would work with any VecEnv?
Hello, why would you want to do that? and what is wrong with a VecEnv that has only one env ?
I'm not sure what @roybogin's original need was. In my case the friction is that I end up maintaining code with on the one hand the Gymnasium API (like this one):
while True:
action = some_policy(observation)
observation, _, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
some_function(observation, action)
And on the other hand the SB3 API (like that one):
while True:
actions = sb3_policy.predict(observations)
observations, rewards, dones, infos = sb3_policy.env.step(actions)
if dones[0]:
observations = policy.env.reset()
some_function(observations[0], actions[0])
It makes my maintain-O-meter beep :wink: Also the [0]'s in the SB3 snippet are redundant in the context of a single environment. (Apologies in advance if there is an obvious simplification in the SB3 API that I missed.)
I see, did you know that SB3 predict work with gym env too?
edit: btw, vecenv resets auromatically
I see, did you know that SB3 predict work with gym env too?
What I mean is that we autodetect the shape of the input and output the correct shape: https://github.com/DLR-RM/stable-baselines3/blob/84163b468c99538f2c98a3ebcc6124974ec631fd/stable_baselines3/common/policies.py#L362-L364
Thank you @araffin :smiley: This solves the friction point entirely.
edit: btw, vecenv resets auromatically
Whoops, all the more reason to use a gym env and do the resets explicitly :sweat_smile: (In this use case resetting can do real-world things when the code runs on the robot.)