Fixes device decoupling of RL Games training vs. sim device

Open kellyguo11 opened this issue 2 months ago • 1 comments

Description

Allows rlgames to decouple devices for simulation and training. This should allow running simulation on CPU and training on GPU

Type of change

Bug fix (non-breaking change which fixes an issue)

Checklist

[x] I have read and understood the contribution guidelines
[x] I have run the pre-commit checks with ./isaaclab.sh --format
[ ] I have made corresponding changes to the documentation
[x] My changes generate no new warnings
[ ] I have added tests that prove my fix is effective or that my feature works
[ ] I have updated the changelog and the corresponding version in the extension's config/extension.toml file
[ ] I have added my name to the CONTRIBUTORS.md or my name already exists there

Nov 11 '25 01:11 kellyguo11

Greptile Overview

Greptile Summary

This PR enables device decoupling for RL Games, allowing simulation to run on one device (e.g., CPU) while training runs on another (e.g., GPU). The key changes are:

Core Implementation:

Added device transfer logic in RlGamesVecEnvWrapper._process_obs() to move observations from sim_device to rl_device
Existing step() method already handled action transfers (rl_device → sim_device) and reward/done transfers (sim_device → rl_device)
Removed the forced device coupling in train.py and play.py that previously overrode agent config to match sim device

How It Works:

The wrapper now reads rl_device from the agent config's params.config.device field
Actions generated on rl_device are transferred to sim_device before env.step()
Observations, rewards, dones, and extras are transferred from sim_device to rl_device after each step
The implementation correctly handles the case where devices are the same (no-op transfers)

Testing:

Comprehensive test suite added covering RL Games, RSL-RL, SB3, and skrl
Tests verify GPU→GPU, GPU→CPU, and CPU→GPU device combinations
All device transfers are validated to ensure data arrives on the correct device

The implementation is clean, well-documented, and follows the existing architecture. Device transfers use .to(device=...) with proper cloning where needed to avoid in-place modification issues.

Confidence Score: 5/5

This PR is safe to merge with minimal risk - implementation is straightforward and well-tested
The changes are minimal, focused, and well-architected. The core logic simply adds observation device transfers to match existing action/reward transfer patterns. The removed code that forced device coupling was the actual bug being fixed. Comprehensive tests validate all device combinations across multiple RL libraries. No breaking changes to APIs or existing functionality.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
source/isaaclab_rl/isaaclab_rl/rl_games/rl_games.py	5/5	Added device transfer logic in `_process_obs` to move observations from sim device to RL device when they differ
scripts/reinforcement_learning/rl_games/train.py	5/5	Removed manual device override logic that forced agent device to match sim device, enabling proper device decoupling
scripts/reinforcement_learning/rl_games/play.py	5/5	Removed manual device override logic that forced agent device to match sim device during inference

Sequence Diagram

sequenceDiagram
    participant Policy as RL Policy<br/>(rl_device)
    participant Wrapper as RlGamesVecEnvWrapper
    participant Env as Isaac Environment<br/>(sim_device)
    
    Note over Policy,Env: Training Step Flow
    
    Policy->>Wrapper: step(actions)<br/>[on rl_device]
    Wrapper->>Wrapper: actions.to(sim_device)<br/>Transfer to simulation
    Wrapper->>Env: step(actions)<br/>[on sim_device]
    Env-->>Wrapper: obs, rew, term, trunc<br/>[on sim_device]
    
    Wrapper->>Wrapper: _process_obs(obs_dict)<br/>Transfer obs to rl_device
    Wrapper->>Wrapper: rew.to(rl_device)<br/>dones.to(rl_device)<br/>extras.to(rl_device)
    
    Wrapper-->>Policy: obs, rew, dones, extras<br/>[on rl_device]
    
    Note over Policy,Env: Reset Flow
    
    Policy->>Wrapper: reset()
    Wrapper->>Env: reset()
    Env-->>Wrapper: obs_dict<br/>[on sim_device]
    Wrapper->>Wrapper: _process_obs(obs_dict)<br/>Transfer to rl_device
    Wrapper-->>Policy: obs<br/>[on rl_device]

Nov 11 '25 01:11 greptile-apps[bot]