IsaacLab icon indicating copy to clipboard operation
IsaacLab copied to clipboard

Updates environments to use `gymnasium.vector.VectorEnv`

Open pseudo-rnd-thoughts opened this issue 4 months ago • 17 comments

Description

Currently, the environments use gymnasium.Env even though the implementations are all vectorised-based. This PR updates the implementations to use gymnasium.vector.VectorEnv.

Fixes https://github.com/isaac-sim/IsaacLab/issues/3095

Type of change

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist

  • [ ] I have run the pre-commit checks with ./isaaclab.sh --format
  • [ ] I have made corresponding changes to the documentation
  • [ ] My changes generate no new warnings
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • [ ] I have added my name to the CONTRIBUTORS.md or my name already exists there

pseudo-rnd-thoughts avatar Aug 09 '25 14:08 pseudo-rnd-thoughts

Thanks for the PR. Will we require to switch to gym.make_vec instead of gym.make with this change?

kellyguo11 avatar Aug 09 '25 15:08 kellyguo11

@kellyguo11 Yes and the gym.register should use vector_entry_point for make_vec

Do you know what autoreset mode IsaacLab uses? https://farama.org/Vector-Autoreset-Mode

pseudo-rnd-thoughts avatar Aug 10 '25 07:08 pseudo-rnd-thoughts

@pseudo-rnd-thoughts Thanks for submitting this PR, I think IsaacLab uses same-step reset mode!

ooctipus avatar Aug 13 '25 19:08 ooctipus

@pseudo-rnd-thoughts Let me know if there is something I can help, it will be nice to update to gymnasium up-to-date standard : ))

ooctipus avatar Aug 14 '25 22:08 ooctipus

image

We don't provide terminal_obs in the info dictionary. So it is "Disabled" mode.

Mayankm96 avatar Aug 16 '25 04:08 Mayankm96

ohhh thats right, sorry for the miss, @Mayankm96 Maybe do you think it makes sense to let user choose which mode to enable in IsaacLab side? and we can default to Disabled mode for RL applications, I think if user can change the mode from Disabled to Same-Step or Next-Step, then we could rewrite Recorder Manager not inside ManagerBasedRLEnv, one reason it had to be there was because Disabled Mode doesn't provide terminal_obs, so the only way they can record pre-reset observation is to put in ManagerBasedRLEnv in reset logic.

ooctipus avatar Aug 16 '25 20:08 ooctipus

We don't provide terminal_obs in the info dictionary. So it is "Disabled" mode.

@Mayankm96 Are you sure? The manager_based_rl_env has the following note

Note:
   For vectorized environments, it is recommended to **only** call the :meth:`reset`
   method once before the first call to :meth:`step`, i.e. after the environment is created.
   After that, the :meth:`step` function handles the reset of terminated sub-environments.
   This is because the simulator does not support resetting individual sub-environments
   in a vectorized environment. 

For the disabled mode, users must call reset to reset each sub-environment. Would you happen to know where the auto-reset code is implemented?

@ooctipus Could you approve the github actions? I'm unfamiliar with the codebase so I suspect that I've missed a necessary code change and I haven't been able to get the tests to run locally.

pseudo-rnd-thoughts avatar Aug 17 '25 13:08 pseudo-rnd-thoughts

@pseudo-rnd-thoughts : I didn't see the reset(masks) is a user-defined call. In that case, the closest we have to what is described is the Same-Step reset. Although we don't provide the final_obs as mentioned above.

Mayankm96 avatar Aug 17 '25 16:08 Mayankm96

I didn't see the reset(masks) is a user-defined call. In that case, the closest we have to what is described is the Same-Step reset. Although we don't provide the final_obs as mentioned above.

Yes, checking the code in DirectRLEnv.step, it looks like you've implemented the same-step autoreset mode as I've described it in that article; however, you don't return the final_obs, as you mentioned.

Is there a reason the implementation doesn't include the episode's final observation, as currently, you get the next episode's first observation? To my understanding, it's necessary to correctly implement most RL algorithms as in the case of truncation (i.e., time-outs), you shouldn't have an episode's final observation. https://farama.org/Gymnasium-Terminated-Truncated-Step-API

pseudo-rnd-thoughts avatar Aug 17 '25 20:08 pseudo-rnd-thoughts

| /workspace/isaaclab/source/isaaclab_rl/test/test_rl_games_wrapper.py | FAILED | 0.08 | 0/1 | | /workspace/isaaclab/source/isaaclab_rl/test/test_skrl_wrapper.py | FAILED | 0.09 | 0/1 | | /workspace/isaaclab/source/isaaclab_rl/test/test_sb3_wrapper.py | FAILED | 0.08 | 0/1 | | /workspace/isaaclab/source/isaaclab_rl/test/test_rsl_rl_wrapper.py | FAILED | 0.16 | 0/2 |

seems like few tests are failing, I am letting it run again, if fails again I ll take a look at it

ooctipus avatar Aug 19 '25 05:08 ooctipus

Is there a reason the implementation doesn't include the episode's final observation, as currently, you get the next episode's first observation?

I checked the code for RSL-RL (RL-Games and skrl also do something similar): It seems they use the state before the termination ($s_{T-1}$ instead of $s_T$) for computing the bootstrapped values at truncations.

  • SKRL: https://github.com/Toni-SM/skrl/blob/main/skrl/agents/torch/ppo/ppo.py#L297-L299
  • RSL-RL: https://github.com/leggedrobotics/rsl_rl/blob/main/rsl_rl/algorithms/ppo.py#L161-L164

While I agree, the final_obs is technically the right thing to do. Empirically, it does not make a difference. Using the values for states (computed in the previous algorithm step) saves an additional forward call of the critic network and provides some performance gain when handling a large batch of environments.

Tagging @Toni-SM since he has a better idea for algorithms other than PPO.

Mayankm96 avatar Aug 22 '25 07:08 Mayankm96

Do we need to switch from gym.make to gym.make_vec calls in the scripts? @ooctipus

Mayankm96 avatar Aug 22 '25 07:08 Mayankm96

Do we need to switch from gym.make to gym.make_vec calls in the scripts? @ooctipus

I think this could become a breaking change for user environments, not sure if we get much benefit in return?

kellyguo11 avatar Aug 22 '25 08:08 kellyguo11

the largest benefit will probably be for implementing a new rl-lib wrapper, the process could be easier by viewing ManagerBasedEnv or DirectEnv as VectorEnv.

Updating to VectorEnv should make it clearer to users that the environments are vectorised and remove the necessity for libraries to implement their own converter functions.
Further, users will be able to use vector wrappers ()with the environments, which isn't possible with gymnasium.Env implementations.

right now we get around it by wraping IsaacLabEnvs(not gym env) with RLLib-Wrapper , where the rl-libs can directly assume it is interacting with ManagerBasedEnv, or DirectEnv. We basically wrote own converter functions, but even switch to VectorEnv, customized converter functions may still needed because each rllib also have some special requirements that a generic VectorEnv is likely not enough to satisfy.

I guess maybe the largest breaking change is because entry_point -> vector_entry_point? If we applied this change, a user implemented entry_point for their environment will be potentially not get registered properly until they also change it to vector_entry_point. I am also weighing if change entry_point -> vector_entry_point is needed at all. Since in current IsaacLab implementation, even if num_envs == 1, it is still vectorized as (n, observation_shape), (n, action_shape), n = 1

My general feeling is that if we introduce very minimum breaking change, I think there is still value in adapting to gym's best practice for long term.

ooctipus avatar Aug 22 '25 20:08 ooctipus

Do I understand correctly that currently there is no way to access the true final obs of an episode? Why not just add it to the info dict as you compute it anyway in step?

noahfarr avatar Aug 25 '25 13:08 noahfarr

@noahfarr We don't compute it in the step. It's only computed after the reset is performed for all the environments together. We don't support computing the observations for only a sub-set of environment indices in the vectorized environment.

From what batched RL libraries do: the agent can use the last observations of the previous step as final obs instead of having the environment return this twice. At least for continuous control problems, I think the two obs can be considered fairly close to use an approximation, and I would rather avoid calling compute obs twice for performance reasons.

We have two ways of designing environments. Users who design their environments can compute final_obs in the step function if they need it. But the examples we provide don't do this right now. Very likely, we won't due to reasons above.

Mayankm96 avatar Sep 06 '25 08:09 Mayankm96

A big +1 for moving to gymnasium's vector env and providing a clear AutoresetMode. Some frameworks work around this issue (RSL_RL for sure, SKRL at least for PPO), but others (e.g. torchRL) do not. It would be great if IsaacLab had less footguns when switching learning frameworks.

mschuck-rai avatar Nov 07 '25 10:11 mschuck-rai