IsaacLab
IsaacLab copied to clipboard
Fixes device decoupling of RL Games training vs. sim device
Description
Allows rlgames to decouple devices for simulation and training. This should allow running simulation on CPU and training on GPU
Type of change
- Bug fix (non-breaking change which fixes an issue)
Checklist
- [x] I have read and understood the contribution guidelines
- [x] I have run the
pre-commitchecks with./isaaclab.sh --format - [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] I have updated the changelog and the corresponding version in the extension's
config/extension.tomlfile - [ ] I have added my name to the
CONTRIBUTORS.mdor my name already exists there
Greptile Overview
Greptile Summary
This PR enables device decoupling for RL Games, allowing simulation to run on one device (e.g., CPU) while training runs on another (e.g., GPU). The key changes are:
Core Implementation:
- Added device transfer logic in
RlGamesVecEnvWrapper._process_obs()to move observations fromsim_devicetorl_device - Existing
step()method already handled action transfers (rl_device → sim_device) and reward/done transfers (sim_device → rl_device) - Removed the forced device coupling in
train.pyandplay.pythat previously overrode agent config to match sim device
How It Works:
- The wrapper now reads
rl_devicefrom the agent config'sparams.config.devicefield - Actions generated on
rl_deviceare transferred tosim_devicebeforeenv.step() - Observations, rewards, dones, and extras are transferred from
sim_devicetorl_deviceafter each step - The implementation correctly handles the case where devices are the same (no-op transfers)
Testing:
- Comprehensive test suite added covering RL Games, RSL-RL, SB3, and skrl
- Tests verify GPU→GPU, GPU→CPU, and CPU→GPU device combinations
- All device transfers are validated to ensure data arrives on the correct device
The implementation is clean, well-documented, and follows the existing architecture. Device transfers use .to(device=...) with proper cloning where needed to avoid in-place modification issues.
Confidence Score: 5/5
- This PR is safe to merge with minimal risk - implementation is straightforward and well-tested
- The changes are minimal, focused, and well-architected. The core logic simply adds observation device transfers to match existing action/reward transfer patterns. The removed code that forced device coupling was the actual bug being fixed. Comprehensive tests validate all device combinations across multiple RL libraries. No breaking changes to APIs or existing functionality.
- No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| source/isaaclab_rl/isaaclab_rl/rl_games/rl_games.py | 5/5 | Added device transfer logic in _process_obs to move observations from sim device to RL device when they differ |
| scripts/reinforcement_learning/rl_games/train.py | 5/5 | Removed manual device override logic that forced agent device to match sim device, enabling proper device decoupling |
| scripts/reinforcement_learning/rl_games/play.py | 5/5 | Removed manual device override logic that forced agent device to match sim device during inference |
Sequence Diagram
sequenceDiagram
participant Policy as RL Policy<br/>(rl_device)
participant Wrapper as RlGamesVecEnvWrapper
participant Env as Isaac Environment<br/>(sim_device)
Note over Policy,Env: Training Step Flow
Policy->>Wrapper: step(actions)<br/>[on rl_device]
Wrapper->>Wrapper: actions.to(sim_device)<br/>Transfer to simulation
Wrapper->>Env: step(actions)<br/>[on sim_device]
Env-->>Wrapper: obs, rew, term, trunc<br/>[on sim_device]
Wrapper->>Wrapper: _process_obs(obs_dict)<br/>Transfer obs to rl_device
Wrapper->>Wrapper: rew.to(rl_device)<br/>dones.to(rl_device)<br/>extras.to(rl_device)
Wrapper-->>Policy: obs, rew, dones, extras<br/>[on rl_device]
Note over Policy,Env: Reset Flow
Policy->>Wrapper: reset()
Wrapper->>Env: reset()
Env-->>Wrapper: obs_dict<br/>[on sim_device]
Wrapper->>Wrapper: _process_obs(obs_dict)<br/>Transfer to rl_device
Wrapper-->>Policy: obs<br/>[on rl_device]