taichi icon indicating copy to clipboard operation
taichi copied to clipboard

[Bug]: CUDA_ERROR_ILLEGAL_ADDRESS when using RTX 5080 GPU

Open moribots opened this issue 7 months ago • 2 comments

When running Genesis on a 5080 GPU, I run into the following:

Bug Description

When I ran Genesis/example/locomotion/go2_train.py, I encountered the following error. There was no issue with the 4060 Ti, but this problem occurs when running on the 5090 GPU. I'm wondering if anyone has experienced similar issues and what solutions you might have. No combination of versions seems to work.

Script

python Genesis/example/locomotion/go2_train.py

Expected Behavior The script (Genesis/example/locomotion/go2_train.py) should run without CUDA errors on the RTX 5080 GPU, just as it does on the RTX 4060 Ti. The training process should execute normally without any CUDA_ERROR_ILLEGAL_ADDRESS errors.

Relevant log output

RuntimeError: [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered while calling stream_synchronize (cuStreamSynchronize)

Traceback (most recent call last):
  File "/home/-/workspace/Genesis/examples/locomotion/go2_train.py", line 180, in <module>
    main()
  File "/home/-/workspace/Genesis/examples/locomotion/go2_train.py", line 176, in main
    runner.learn(num_learning_iterations=args.max_iterations, init_at_random_ep_len=True)
  File "/home/-/.local/lib/python3.10/site-packages/rsl_rl/runners/on_policy_runner.py", line 151, in learn
    obs, rewards, dones, infos = self.env.step(actions.to(self.env.device))
  File "/home/-/workspace/Genesis/examples/locomotion/go2_env.py", line 129, in step
    self.base_pos[:] = self.robot.get_pos()
  File "/home/-/workspace/Genesis/genesis/utils/misc.py", line 72, in wrapper
    return method(self, *args, **kwargs)
  File "/home/-/workspace/Genesis/genesis/engine/entities/rigid_entity/rigid_entity.py", line 1672, in get_pos
    return self._solver.get_links_pos(self._base_links_idx, envs_idx, unsafe=unsafe).squeeze(-2)
  File "/home/-/workspace/Genesis/genesis/engine/solvers/rigid/rigid_solver_decomp.py", line 4483, in get_links_pos
    tensor = ti_field_to_torch(self.links_state.pos, envs_idx, links_idx, transpose=True, unsafe=unsafe)
  File "/home/-/workspace/Genesis/genesis/utils/misc.py", line 450, in ti_field_to_torch
    ti.sync()
  File "/home/-/.local/lib/python3.10/site-packages/taichi/lang/runtime_ops.py", line 8, in sync
    impl.get_runtime().sync()
  File "/home/-/.local/lib/python3.10/site-packages/taichi/lang/impl.py", line 499, in sync
    self.prog.synchronize()
RuntimeError: [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered while calling stream_synchronize (cuStreamSynchronize)

[Genesis] [10:42:14] [ERROR] RuntimeError: [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered while calling stream_synchronize (cuStreamSynchronize)
[E 05/19/25 10:42:15.164 189835] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered while calling stream_synchronize (cuStreamSynchronize)


Exception ignored in atexit callback: <function destroy at 0x724b395d4550>
Traceback (most recent call last):
  File "/home/-/workspace/Genesis/genesis/__init__.py", line 271, in destroy
    ti.reset()
  File "/home/-/.local/lib/python3.10/site-packages/taichi/lang/misc.py", line 220, in reset
    impl.reset()
  File "/home/-/.local/lib/python3.10/site-packages/taichi/lang/impl.py", line 512, in reset
    pytaichi.clear()
  File "/home/-/.local/lib/python3.10/site-packages/taichi/lang/impl.py", line 492, in clear
    self.prog.finalize()
RuntimeError: [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered while calling stream_synchronize (cuStreamSynchronize)
[E 05/19/25 10:42:15.569 189835] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered while calling stream_synchronize (cuStreamSynchronize)


[E 05/19/25 10:42:15.569 189835] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered while calling mem_free (cuMemFree_v2)

terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits, std::allocator >' Environment OS: Ubuntu 22.04 GPU/CPU: 5080 / Intel(R) Core(TM) Ultra 7 265K GPU-driver version: 570.144 CUDA / CUDA-toolkit version: 12.8 torch ver.: 2.7.0+cu128 Release version or Commit ID v0.2.1-312-g37c1ce6

from: https://github.com/Genesis-Embodied-AI/Genesis/issues/1157

moribots avatar May 20 '25 05:05 moribots

Same issue here with: OS: Ubuntu 24.04 GPU/CPU: 5090 / Intel® Core™ i9-14900KF × 32 GPU-driver version: 575.51.03 CUDA / CUDA-toolkit version: 12.9

Kashu7100 avatar May 21 '25 00:05 Kashu7100

solved: https://github.com/taichi-dev/taichi/pull/8735

johnnynunez avatar Jul 16 '25 10:07 johnnynunez