Genesis
Genesis copied to clipboard
[Bug]: CUDA_ERROR_ILLEGAL_ADDRESS when using RTX 5090 GPU
Bug Description
When I ran Genesis/example/locomotion/go2_train.py, I encountered the following error. There was no issue with the 4060 Ti, but this problem occurs when running on the 5090 GPU. I'm wondering if anyone has experienced similar issues and what solutions you might have. No combination of versions seems to work.
The additional findings
- when running "scene.add_entity(gs.morphs.URDF(file="urdf/plane/plane.urdf", fixed=True))", an error occurs, but when that part is modified to "self.scene.add_entity(gs.morphs.Plane())", the problem does not occur.
- When running with 1,024 environments, the learning progresses properly, but when increased to 2,048 environments, issues occur where objects fall through the plane and drop to the floor, or memory issues arise causing termination.
Steps to Reproduce
If possible, provide a script triggering the bug, e.g.
python Genesis/example/locomotion/go2_train.py
Expected Behavior
The script (Genesis/example/locomotion/go2_train.py) should run without CUDA errors on the RTX 5090 GPU, just as it does on the RTX 4060 Ti. The training process should execute normally without any CUDA_ERROR_ILLEGAL_ADDRESS errors.
Screenshots/Videos
No response
Relevant log output
[E 05/19/25 10:42:14.988 189835] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered while calling stream_synchronize (cuStreamSynchronize)
Traceback (most recent call last):
File "/home/-/workspace/Genesis/examples/locomotion/go2_train.py", line 180, in <module>
main()
File "/home/-/workspace/Genesis/examples/locomotion/go2_train.py", line 176, in main
runner.learn(num_learning_iterations=args.max_iterations, init_at_random_ep_len=True)
File "/home/-/.local/lib/python3.10/site-packages/rsl_rl/runners/on_policy_runner.py", line 151, in learn
obs, rewards, dones, infos = self.env.step(actions.to(self.env.device))
File "/home/-/workspace/Genesis/examples/locomotion/go2_env.py", line 129, in step
self.base_pos[:] = self.robot.get_pos()
File "/home/-/workspace/Genesis/genesis/utils/misc.py", line 72, in wrapper
return method(self, *args, **kwargs)
File "/home/-/workspace/Genesis/genesis/engine/entities/rigid_entity/rigid_entity.py", line 1672, in get_pos
return self._solver.get_links_pos(self._base_links_idx, envs_idx, unsafe=unsafe).squeeze(-2)
File "/home/-/workspace/Genesis/genesis/engine/solvers/rigid/rigid_solver_decomp.py", line 4483, in get_links_pos
tensor = ti_field_to_torch(self.links_state.pos, envs_idx, links_idx, transpose=True, unsafe=unsafe)
File "/home/-/workspace/Genesis/genesis/utils/misc.py", line 450, in ti_field_to_torch
ti.sync()
File "/home/-/.local/lib/python3.10/site-packages/taichi/lang/runtime_ops.py", line 8, in sync
impl.get_runtime().sync()
File "/home/-/.local/lib/python3.10/site-packages/taichi/lang/impl.py", line 499, in sync
self.prog.synchronize()
RuntimeError: [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered while calling stream_synchronize (cuStreamSynchronize)
[Genesis] [10:42:14] [ERROR] RuntimeError: [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered while calling stream_synchronize (cuStreamSynchronize)
[E 05/19/25 10:42:15.164 189835] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered while calling stream_synchronize (cuStreamSynchronize)
Exception ignored in atexit callback: <function destroy at 0x724b395d4550>
Traceback (most recent call last):
File "/home/-/workspace/Genesis/genesis/__init__.py", line 271, in destroy
ti.reset()
File "/home/-/.local/lib/python3.10/site-packages/taichi/lang/misc.py", line 220, in reset
impl.reset()
File "/home/-/.local/lib/python3.10/site-packages/taichi/lang/impl.py", line 512, in reset
pytaichi.clear()
File "/home/-/.local/lib/python3.10/site-packages/taichi/lang/impl.py", line 492, in clear
self.prog.finalize()
RuntimeError: [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered while calling stream_synchronize (cuStreamSynchronize)
[E 05/19/25 10:42:15.569 189835] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered while calling stream_synchronize (cuStreamSynchronize)
[E 05/19/25 10:42:15.569 189835] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered while calling mem_free (cuMemFree_v2)
terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >'
Environment
- OS: Ubuntu 24.04, 22.04
- GPU/CPU: 5090 / Intel(R) Core(TM) Ultra 7 265K
- GPU-driver version: 570.144
- CUDA / CUDA-toolkit version: 12.8
- torch ver.: 2.7.0+cu128
Release version or Commit ID
v0.2.1-312-g37c1ce6
Additional Context
No response