[Bug Report] The training shut down for no reasons with 50 series card
The training shut down for no reasons with 50 series card [bug].
Describe the bug
The training shuts down for unknown reasons.
[INFO]: Simulation is stopped. Shutting down the app.
Steps to reproduce
python scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Velocity-Rough-Anymal-B-v0 --headless
and
python scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Velocity-Rough-Anymal-C-v0 --headless
System Info
- Isaac Sim Version: 4.5.0
- OS: Ubuntu 20.04
- GPU: RTX5080
- CUDA:12.8
- GPU Driver: 570.144
- torch: 2.7.0
Sometimes, it just gets stuck and does not move on
Thank you for posting this, the team will review it.
Can you use the 50-series graphics cards for training? But when you open Isaac Sim 4.5 with a 50-series card, doesn't it blur objects? I think these two aren't compatible. I assumed that 50-series cards could only be used with Isaac Sim 5.0 paired with Isaac Lab 2.2. I’m really sorry for bringing up a topic unrelated to the issue, but I’m eager to know the answer.
We have not been able to reproduce the initial issue, could you give the 5.0 Isaac Sim branch a try and see if you still observe it there?
Isaac Sim 4.5 will be able to run with 50-series cards, but there is a rendering issue that disables the denoiser, so rendered results will look very noisy. This has been fixed in Isaac Sim 5.0.
@kellyguo11 Yes, now I am using 5.1.0 Isaacsim, 2.3.0 Isaaclab, 570 nvidia-driver, however, the interruption still happens without any error information or warning
I have figured it out finally!!! The interruption occurs since I use intel_pstate driver for my cpu(Intel ultra 7-265K), which is the newest CPU hardware having compatibility issues with ubuntu.
The solutions are as follows:
- sudo gedit /etc/default/grub
- Change the line
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"toGRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_pstate=disable"and save. - sudo update-grub
- sudo reboot
After these operations, you can see the cpu driver has changed from intel_pstate to acpi-cpufreq according to cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver.
The interruption does not occur any more.
ps: Do not forget to set CPU scaling_governor as schedutil, which will be better