[Bug Report] Non-reproducible training results in vision-based tasks with identical seeds
Describe the bug
When training RL agents in IsaacLab, vision-based environments result in non-deterministic outcomes across multiple runs, even when using a fixed random seed. In contrast, state-based environments exhibit perfect reproducibility under the same conditions.
This issue was confirmed by running five separate tests with identical settings on each of the following three official IsaacLab environments:
-
Isaac-Cartpole-v0(state-based): Reproducible -
Isaac-Cartpole-RGB-v0(vision-based): Not reproducible -
Isaac-Cartpole-RGB-ResNet18-v0(vision-based): Not reproducible
The non-determinism appears to be introduced by the vision processing pipeline, as it is the key difference between the reproducible and non-reproducible environments. However, as I have not investigated this in-depth, further analysis is needed to identify the root cause.
The provided WandB logs show the reward curves from several training executions. As illustrated, the training curves for the vision-based environments show significant divergence. This non-reproducibility occurs even though all experimental settings, including the random seed, were kept identical for each run.
(state-{i}: Isaac-Cartpole-v0, rgb-{i}: Isaac-Cartpole-RGB-v0, resnet-{i}: Isaac-Cartpole-RGB-ResNet18-v0)
Steps to reproduce
- Run the state-based environment five times with a fixed seed:
python scripts/reinforcement_learning/rl_games/train.py --task Isaac-Cartpole-v0 --headless --seed 42 --max_iteration 100 - Run the vision-based environment five times with the same seed:
python scripts/reinforcement_learning/rl_games/train.py --task Isaac-Cartpole-RGB-v0 --enable_cameras --headless --seed 42 --max_iteration 100 - Run the vision feature from ResNet18-based environment five times with the same seed:
python scripts/reinforcement_learning/rl_games/train.py --task Isaac-Cartpole-RGB-ResNet18-v0 --enable_cameras --headless --seed 42 --max_iteration 100
All hyperparameters and environment settings not specified in the CLI arguments default to the values defined in the code.
System Info
- Commit: f20d74c59d3e20fc822c4e4c5bf8535a48c5aa0b
- Isaac Sim Version: 4.5
- OS: Ubuntu 22.04
- GPU: RTX A6000
- CUDA: 12.9
- GPU Driver: 575.64.03
Additional context
A note on the logs: For some runs, WandB logging halted before the experiment's completion, despite all runs being executed for an identical number of steps. This does not impact the overall analysis. For the reproducible environment (Isaac-Cartpole-v0), training curves were perfectly identical until the earliest halt. For the non-reproducible environments, the curves had already diverged long before any logging stopped.
Checklist
- [x] I have checked that there is no similar issue in the repo (required)
- [x] I have checked that the issue is not in running Isaac Sim itself and is related to the repo
Acceptance Criteria
- [ ] Verify whether the vision feature pipeline introduces non-determinism.
- [ ] Identify fixes or configurations to achieve reproducibility across both state-based and vision-based environments.
Thank you for posting this and adding the plots. It would be great if you could move to Isaac Sim 5.0, then we could discuss how to repro this. Thanks.
Thank you for your response. Following your suggestion, I have updated to the latest versions.
- IsaacSim: 5.0.0
- IssacLab: 90dda53f7c08354b8c9ab3e509eae5b864a4040d
I trained the same tasks with the same hyperparameters and seeds mentioned above. Unfortunately, the issue where the training curves of vision-based tasks are not reproducible still persists. I’ve attached the corresponding WandB logs from this experiment for your reference.
For the ResNet example, the resnet inference itself may be non-deterministic, which would lead to this issue:
https://discuss.pytorch.org/t/is-trained-resnet-50-a-deterministic-model/99737
https://docs.pytorch.org/docs/stable/notes/randomness.html
Hi @garylvov,
I added following function to scripts/reinforcement_learning/rl_games/train.py and call it to strictly enforce determinism, including operations that use cudnn.
import torch
import random
import numpy as np
def _set_seed(seed: int):
torch.use_deterministic_algorithms(True)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
random.seed(seed)
np.random.seed(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
torch.utils.deterministic.fill_uninitialized_memory = True
torch.backends.cuda.matmul.allow_tf32 = False
torch.backends.cudnn.allow_tf32 = False
# set the environment seed (after multi-gpu config for updated rank from agent seed)
# note: certain randomizations occur in the environment initialization so we set the seed here
env_cfg.seed = agent_cfg["params"]["seed"]
_set_seed(env_cfg.seed)
I also set the following environment variables before running the code:
export PYTHONHASHSEED=42
export CUBLAS_WORKSPACE_CONFIG=:16:8
However, the results are still not reproducible. It also seems strange that even simple RGB tasks are not reproducible, while state-based setups remain reproducible under the same conditions as the vision-based tasks.
I think whatever issue affects RGB must also affect the feature learning pipeline.
I suspect this may be due to non-deterministic rendering effects.
When using the tiled camera, there is a ghosting effect due to DLSS: https://github.com/isaac-sim/IsaacLab/issues/1031#issuecomment-2397379876
I suspect that this ghosting effect or even rendering itself to be non-deterministic and thus leads to this behavior, as then the observations are always slightly different for each run.
I would try disabling the denoiser as outlined in that other issue above and: "If you turn antialiasing off or use FXAA (that does not use previous frames), the ghosting is gone. "
If it's non-deterministic even with denoising off, then I think this bug is due to isaac sim or whatever is used for rendering.
Ok on my end even with
import omni.replicator.core as rep
# Use FXAA (no history, avoids the ghosting)
rep.settings.set_render_rtx_realtime(antialiasing="OFF") #FXAA doesn't work either
in the train.py
I also get non-deterministic results:
python scripts/reinforcement_learning/rl_games/train.py --task Isaac-Cartpole-RGB-v0 --enable_cameras --headless --seed 42 --max_iteration 100
I think this is due to isaac sim or the rendering process
Thank you for your kind response. I also get non-deterministic results even when using FXAA in my environment.
I will leave a comment here if I find a solution. It would be great if you could share your findings too. Thanks!
This issue still exist in Sim 5.1. I calculate the difference of camera obs after reset the env, as in https://github.com/isaac-sim/IsaacLab/issues/3962