IsaacLab icon indicating copy to clipboard operation
IsaacLab copied to clipboard

Clips actions to large limits before applying them to the environment

Open renezurbruegg opened this issue 1 year ago • 8 comments

Description

Currently, the actions from the policy are directly applied to the environment and also often fed back to the policy using the last action as observation.

Doing this, can lead to instability during training, since applying a large action can introduce a negative feedback loop. More specifically, applying a very large action leads to a large last_action observations, which often results in a large error in the critic, which can lead to even larger actions being sampled in the future.

This PR aims to fix this, by clipping the actions to (large) hard limits before applying them to the environment. This prohibits the actions from growing continuously and - in my case - greatly improves training stability.

Type of change

  • New feature (non-breaking change which adds functionality)

TODO

  • [ ] Support multi dimensional action bounds.
  • [ ] Add tests

Checklist

  • [x] I have run the pre-commit checks with ./isaaclab.sh --format
  • [ ] I have made corresponding changes to the documentation
  • [ ] My changes generate no new warnings
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • [x] I have added my name to the CONTRIBUTORS.md or my name already exists there

renezurbruegg avatar Sep 13 '24 12:09 renezurbruegg

Slightly related issue: #673

renezurbruegg avatar Sep 13 '24 12:09 renezurbruegg

In my opinion, RL libraries should take care of this and no Isaac Lab. Also, adding this config prevents defining other spaces since this is specific tied to Box. Instead, I propose the following: https://github.com/isaac-sim/IsaacLab/issues/864#issuecomment-2351819930, and with the statement that the RL libraries are in charge of generating a valid action for the task.

Toni-SM avatar Sep 15 '24 22:09 Toni-SM

@Toni-SM I agree. We should move this to the environment wrappers (similar to what we do for RL-Games):

https://github.com/isaac-sim/IsaacLab/blob/main/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/rl_games.py#L82-L94

Regarding, the action/obs space design for the environments, I think it is better to do that as its own separate thing. The current fix in this MR is at least critical for the continuous learning tasks as users otherwise get "NaNs" from the simulation due to the policy feedback loop (large action into observations that then lead to larger action predictions - which eventually cause the sim to go unstable). So I'd prefer if we don't block this fix itself.

Mayankm96 avatar Sep 16 '24 04:09 Mayankm96

@renezurbruegg Would you be able to help move the changes to the wrappers?

Mayankm96 avatar Sep 25 '24 10:09 Mayankm96

This will introduce "arbitrary" bounds of -100,100 to any new user that merges this PR, which could lead to unexpected behaviour.

How should this be addressed?

In my opinion there are three options:

  1. Don't do anything, since the bounds of 100 are super large and the policies should not produce these actions anyway.
  2. Change the default to -inf, inf, essentially keeping it the same as now but add a FAQ to the documentation referring to this issue.
  3. Add a check if the limits have been active in any environment and print a warning to the terminal.

I personally prefer option (3).

renezurbruegg avatar Oct 03 '24 07:10 renezurbruegg

Hi @renezurbruegg

Please, note that current implementation is in conflict with https://github.com/isaac-sim/IsaacLab/pull/1117 for the direct workflow

Toni-SM avatar Oct 03 '24 13:10 Toni-SM

Can these changes here directly be integrated in #1117 then?

renezurbruegg avatar Oct 03 '24 13:10 renezurbruegg

@renezurbruegg , as I commented previously, in my opinion the RL libraries should take care of this and no Isaac Lab.

For example, using skrl you can set model parameter clip_actions: True or define the model output as follows output: 100 * tanh(ACTIONS) in the agent config file skrl_ppo_cfg.yaml.

However, if the target library is not able to take care of that, the option number 3 (which will not prevent the training from throwing an exception after a certain time of execution) you mentioned, or the clipping of the action directly in the task implementation for critical cases, could be a solution.

Toni-SM avatar Oct 03 '24 13:10 Toni-SM

This MR is important. However, I agree with Toni's remark that the RL library should handle it. I have made a new MR for the RL wrapper environment: #2019

Closing this MR as it has become stale.

Mayankm96 avatar Mar 05 '25 08:03 Mayankm96