Clips actions to large limits before applying them to the environment
Description
Currently, the actions from the policy are directly applied to the environment and also often fed back to the policy using the last action as observation.
Doing this, can lead to instability during training, since applying a large action can introduce a negative feedback loop. More specifically, applying a very large action leads to a large last_action observations, which often results in a large error in the critic, which can lead to even larger actions being sampled in the future.
This PR aims to fix this, by clipping the actions to (large) hard limits before applying them to the environment. This prohibits the actions from growing continuously and - in my case - greatly improves training stability.
Type of change
- New feature (non-breaking change which adds functionality)
TODO
- [ ] Support multi dimensional action bounds.
- [ ] Add tests
Checklist
- [x] I have run the
pre-commitchecks with./isaaclab.sh --format - [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] I have updated the changelog and the corresponding version in the extension's
config/extension.tomlfile - [x] I have added my name to the
CONTRIBUTORS.mdor my name already exists there
Slightly related issue: #673
In my opinion, RL libraries should take care of this and no Isaac Lab.
Also, adding this config prevents defining other spaces since this is specific tied to Box.
Instead, I propose the following: https://github.com/isaac-sim/IsaacLab/issues/864#issuecomment-2351819930, and with the statement that the RL libraries are in charge of generating a valid action for the task.
@Toni-SM I agree. We should move this to the environment wrappers (similar to what we do for RL-Games):
https://github.com/isaac-sim/IsaacLab/blob/main/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/rl_games.py#L82-L94
Regarding, the action/obs space design for the environments, I think it is better to do that as its own separate thing. The current fix in this MR is at least critical for the continuous learning tasks as users otherwise get "NaNs" from the simulation due to the policy feedback loop (large action into observations that then lead to larger action predictions - which eventually cause the sim to go unstable). So I'd prefer if we don't block this fix itself.
@renezurbruegg Would you be able to help move the changes to the wrappers?
This will introduce "arbitrary" bounds of -100,100 to any new user that merges this PR, which could lead to unexpected behaviour.
How should this be addressed?
In my opinion there are three options:
- Don't do anything, since the bounds of 100 are super large and the policies should not produce these actions anyway.
- Change the default to -inf, inf, essentially keeping it the same as now but add a FAQ to the documentation referring to this issue.
- Add a check if the limits have been active in any environment and print a warning to the terminal.
I personally prefer option (3).
Hi @renezurbruegg
Please, note that current implementation is in conflict with https://github.com/isaac-sim/IsaacLab/pull/1117 for the direct workflow
Can these changes here directly be integrated in #1117 then?
@renezurbruegg , as I commented previously, in my opinion the RL libraries should take care of this and no Isaac Lab.
For example, using skrl you can set model parameter clip_actions: True or define the model output as follows output: 100 * tanh(ACTIONS) in the agent config file skrl_ppo_cfg.yaml.
However, if the target library is not able to take care of that, the option number 3 (which will not prevent the training from throwing an exception after a certain time of execution) you mentioned, or the clipping of the action directly in the task implementation for critical cases, could be a solution.
This MR is important. However, I agree with Toni's remark that the RL library should handle it. I have made a new MR for the RL wrapper environment: #2019
Closing this MR as it has become stale.