IsaacLab [Bug Report] Non-reproducible training results in vision-based tasks with identical seeds

Describe the bug

When training RL agents in IsaacLab, vision-based environments result in non-deterministic outcomes across multiple runs, even when using a fixed random seed. In contrast, state-based environments exhibit perfect reproducibility under the same conditions.

This issue was confirmed by running five separate tests with identical settings on each of the following three official IsaacLab environments:

Isaac-Cartpole-v0 (state-based): Reproducible
Isaac-Cartpole-RGB-v0 (vision-based): Not reproducible
Isaac-Cartpole-RGB-ResNet18-v0 (vision-based): Not reproducible

The non-determinism appears to be introduced by the vision processing pipeline, as it is the key difference between the reproducible and non-reproducible environments. However, as I have not investigated this in-depth, further analysis is needed to identify the root cause.

The provided WandB logs show the reward curves from several training executions. As illustrated, the training curves for the vision-based environments show significant divergence. This non-reproducibility occurs even though all experimental settings, including the random seed, were kept identical for each run. (state-{i}: Isaac-Cartpole-v0, rgb-{i}: Isaac-Cartpole-RGB-v0, resnet-{i}: Isaac-Cartpole-RGB-ResNet18-v0)

Steps to reproduce

Run the state-based environment five times with a fixed seed:

python scripts/reinforcement_learning/rl_games/train.py --task Isaac-Cartpole-v0 --headless --seed 42 --max_iteration 100

Run the vision-based environment five times with the same seed:

python scripts/reinforcement_learning/rl_games/train.py --task Isaac-Cartpole-RGB-v0 --enable_cameras --headless --seed 42 --max_iteration 100

Run the vision feature from ResNet18-based environment five times with the same seed:

python scripts/reinforcement_learning/rl_games/train.py --task Isaac-Cartpole-RGB-ResNet18-v0 --enable_cameras --headless --seed 42 --max_iteration 100

All hyperparameters and environment settings not specified in the CLI arguments default to the values defined in the code.

System Info

Commit: f20d74c59d3e20fc822c4e4c5bf8535a48c5aa0b
Isaac Sim Version: 4.5
OS: Ubuntu 22.04
GPU: RTX A6000
CUDA: 12.9
GPU Driver: 575.64.03

Additional context

A note on the logs: For some runs, WandB logging halted before the experiment's completion, despite all runs being executed for an identical number of steps. This does not impact the overall analysis. For the reproducible environment (Isaac-Cartpole-v0), training curves were perfectly identical until the earliest halt. For the non-reproducible environments, the curves had already diverged long before any logging stopped.

Checklist

[x] I have checked that there is no similar issue in the repo (required)
[x] I have checked that the issue is not in running Isaac Sim itself and is related to the repo

Acceptance Criteria

[ ] Verify whether the vision feature pipeline introduces non-determinism.
[ ] Identify fixes or configurations to achieve reproducibility across both state-based and vision-based environments.

Sep 19 '25 06:09 twkang43

Thank you for posting this and adding the plots. It would be great if you could move to Isaac Sim 5.0, then we could discuss how to repro this. Thanks.

Sep 19 '25 17:09 RandomOakForest

Thank you for your response. Following your suggestion, I have updated to the latest versions.

IsaacSim: 5.0.0
IssacLab: 90dda53f7c08354b8c9ab3e509eae5b864a4040d

I trained the same tasks with the same hyperparameters and seeds mentioned above. Unfortunately, the issue where the training curves of vision-based tasks are not reproducible still persists. I’ve attached the corresponding WandB logs from this experiment for your reference.

Sep 21 '25 11:09 twkang43

For the ResNet example, the resnet inference itself may be non-deterministic, which would lead to this issue:

https://discuss.pytorch.org/t/is-trained-resnet-50-a-deterministic-model/99737

https://docs.pytorch.org/docs/stable/notes/randomness.html

Sep 23 '25 20:09 garylvov

Hi @garylvov,

I added following function to scripts/reinforcement_learning/rl_games/train.py and call it to strictly enforce determinism, including operations that use cudnn.

import torch
import random
import numpy as np

def _set_seed(seed: int):    
    torch.use_deterministic_algorithms(True)
        
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    random.seed(seed)
    np.random.seed(seed)
    
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True
    
    torch.utils.deterministic.fill_uninitialized_memory = True
    torch.backends.cuda.matmul.allow_tf32 = False
    torch.backends.cudnn.allow_tf32 = False

# set the environment seed (after multi-gpu config for updated rank from agent seed)
# note: certain randomizations occur in the environment initialization so we set the seed here
env_cfg.seed = agent_cfg["params"]["seed"]
_set_seed(env_cfg.seed)

I also set the following environment variables before running the code:

export PYTHONHASHSEED=42
export CUBLAS_WORKSPACE_CONFIG=:16:8

However, the results are still not reproducible. It also seems strange that even simple RGB tasks are not reproducible, while state-based setups remain reproducible under the same conditions as the vision-based tasks.

Sep 24 '25 07:09 twkang43

I think whatever issue affects RGB must also affect the feature learning pipeline.

I suspect this may be due to non-deterministic rendering effects.

When using the tiled camera, there is a ghosting effect due to DLSS: https://github.com/isaac-sim/IsaacLab/issues/1031#issuecomment-2397379876

I suspect that this ghosting effect or even rendering itself to be non-deterministic and thus leads to this behavior, as then the observations are always slightly different for each run.

I would try disabling the denoiser as outlined in that other issue above and: "If you turn antialiasing off or use FXAA (that does not use previous frames), the ghosting is gone. "

If it's non-deterministic even with denoising off, then I think this bug is due to isaac sim or whatever is used for rendering.

Sep 24 '25 17:09 garylvov

Ok on my end even with

import omni.replicator.core as rep

# Use FXAA (no history, avoids the ghosting)
rep.settings.set_render_rtx_realtime(antialiasing="OFF") #FXAA doesn't work either

in the train.py

I also get non-deterministic results:

python scripts/reinforcement_learning/rl_games/train.py --task Isaac-Cartpole-RGB-v0 --enable_cameras --headless --seed 42 --max_iteration 100

I think this is due to isaac sim or the rendering process

Sep 24 '25 18:09 garylvov

Thank you for your kind response. I also get non-deterministic results even when using FXAA in my environment.

I will leave a comment here if I find a solution. It would be great if you could share your findings too. Thanks!

Sep 25 '25 02:09 twkang43

This issue still exist in Sim 5.1. I calculate the difference of camera obs after reset the env, as in https://github.com/isaac-sim/IsaacLab/issues/3962

Nov 13 '25 04:11 wghou