Gymnasium icon indicating copy to clipboard operation
Gymnasium copied to clipboard

[Question] How to save a state and restore the env

Open yiwc opened this issue 2 years ago • 3 comments

Question

Is it possible to save all states of the environment at a specific step, and be capable to restore to that step?

yiwc avatar Oct 29 '22 03:10 yiwc

Gymnasium environment has no single state variable (some environments do but not all). Therefore, the easier way is to make a pickled version of the environment at each time. For internal gymnasium environments, we know that it is possible to pickle all environment but this might not be true for third-party environments

You might be interested in the new functional API https://github.com/Farama-Foundation/Gymnasium/pull/25. For a fully implemented version, see the Gymnax project

pseudo-rnd-thoughts avatar Oct 29 '22 12:10 pseudo-rnd-thoughts

For Mujoco environments there is a function called set_state, which should, in principle, set the status of the simulator.

https://github.com/Farama-Foundation/Gymnasium/blob/7af0936f0831c559e069f7231a45a5440199fb10/gymnasium/envs/mujoco/mujoco_env.py#L219

And in particular, I think if you can keep state.time, qpos, qvel, state.act, and state.udd_state, then you should be able to set the simulator to a desired state. https://github.com/Farama-Foundation/Gymnasium/blob/7af0936f0831c559e069f7231a45a5440199fb10/gymnasium/envs/mujoco/mujoco_env.py#L222

And if you are using any wrappers, such as TimeLimit, then you will need to keep track of status of the wrappers.

Altriaex avatar Nov 06 '22 09:11 Altriaex

It would be nice to support a consistent convention for this. Not all environments will be resettable, but for those that are, something like get_state and set_state, where get_state returns a single value and set_state accepts a single argument would be super nice. There's a whole class of RL algorithms that require state-resettable environments (think MCTS or really anything that looks like tree-search) and it would be nice to support them consistently where possible.

aaronwalsman avatar Dec 15 '22 22:12 aaronwalsman