Genesis icon indicating copy to clipboard operation
Genesis copied to clipboard

WSL2 Ubuntu22.04 python3.9 CUDA11.8: no CUDA-capable device is detected while calling init

Open LongHZ140516 opened this issue 11 months ago • 1 comments

I am currently using a WSL2 Ubuntu22.04 python3.9 environment with CUDA 11.8 for Genesis experiments. When I run gs.init(backend=gs.cpu), everything works fine, but the FPS is very low.

case0

Therefore, I would like to use CUDA to accelerate the computation. When I set gs.init(backend=gs.cuda), I encountered the following error.

[Genesis] [15:16:25] [INFO] ╭─────────────────────────────────────────────────────────────────────────────────────╮
[Genesis] [15:16:25] [INFO] │┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉ Genesis ┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉│
[Genesis] [15:16:25] [INFO] ╰─────────────────────────────────────────────────────────────────────────────────────╯
[Genesis] [15:16:26] [INFO] Running on [NVIDIA GeForce RTX 4080 SUPER] with backend gs.cuda. Device memory: 15.99 GB.
[E 12/21/24 15:16:26.157 71086] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected while calling init (cuInit)


Traceback (most recent call last):
  File "/home/serein/code/Genesis/hello_genesis.py", line 21, in <module>
    gs.init(backend=gs.cuda)
  File "/home/serein/code/Genesis/genesis/__init__.py", line 97, in init
    ti.init(arch=TI_ARCH[platform][backend], debug=debug, force_scalarize_matrix=True)
  File "/home/serein/anaconda3/envs/genesis/lib/python3.9/site-packages/taichi/lang/misc.py", line 458, in init
    impl.get_runtime().create_program()
  File "/home/serein/anaconda3/envs/genesis/lib/python3.9/site-packages/taichi/lang/impl.py", line 388, in create_program
    self.prog = _ti_core.Program()
RuntimeError: [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected while calling init (cuInit)
[Genesis] [15:16:26] [INFO] 💤 Exiting Genesis and caching compiled kernels...

Here is my nvidia-smi:

Sat Dec 21 15:05:51 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.06              Driver Version: 560.94         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4080 ...    On  |   00000000:01:00.0  On |                  N/A |
|  0%   40C    P5             18W /  340W |    3144MiB /  16376MiB |     18%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |

and nvcc --version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Additionally, I checked CUDA using PyTorch, and the result was True. However, I still cannot run successfully with gs.cuda. image

I also tried changing my CUDA version to 12.1, but the result remained exactly the same as before, with no changes. The result of running python -m taichi diagnose is as follows:

[Taichi] version 1.7.2, llvm 15.0.4, commit 0131dce9, linux, python 3.9.21

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum:  https://forum.taichi.graphics/

Taichi system diagnose:

python: 3.9.21 (main, Dec 11 2024, 16:24:11) 
[GCC 11.2.0]
system: linux
executable: /home/serein/anaconda3/envs/genesis/bin/python
platform: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
architecture: 64bit ELF
uname: uname_result(system='Linux', node='Serein', release='5.15.146.1-microsoft-standard-WSL2', version='#1 SMP Thu Jan 11 04:09:03 UTC 2024', machine='x86_64')
locale: en_US.UTF-8
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.4 LTS
Release:        22.04
Codename:       jammy



import: <module 'taichi' from '/home/serein/anaconda3/envs/genesis/lib/python3.9/site-packages/taichi/__init__.py'>

cpu: True
metal: False
RHI Error: GLFW Error 65543: GLX: Failed to create context: GLXBadFBConfig
opengl: ERROR Command '['/home/serein/anaconda3/envs/genesis/bin/python', '-c', 'import taichi as ti; print("===="); print(ti.lang.misc.is_arch_supported(ti.opengl), end="")']' died with <Signals.SIGSEGV: 11>.
cuda: True
vulkan: False

`glewinfo` not available: [Errno 2] No such file or directory: 'glewinfo'

Sat Dec 21 14:12:15 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.06              Driver Version: 560.94         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4080 ...    On  |   00000000:01:00.0  On |                  N/A |
|  0%   41C    P0             37W /  340W |    3479MiB /  16376MiB |      5%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

[Taichi] version 1.7.2, llvm 15.0.4, commit 0131dce9, linux, python 3.9.21

[Taichi] version 1.7.2, llvm 15.0.4, commit 0131dce9, linux, python 3.9.21
[Taichi] Starting on arch=x64

RHI Error: GLFW Error 65543: GLX: Failed to create context: GLXBadFBConfig
Taichi OpenGL test failed: Command '['/home/serein/anaconda3/envs/genesis/bin/python', '-c', 'import taichi as ti; ti.init(arch=ti.opengl)']' died with <Signals.SIGSEGV: 11>.
[E 12/21/24 14:12:16.540 35656] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected while calling init (cuInit)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/serein/anaconda3/envs/genesis/lib/python3.9/site-packages/taichi/lang/misc.py", line 458, in init
    impl.get_runtime().create_program()
  File "/home/serein/anaconda3/envs/genesis/lib/python3.9/site-packages/taichi/lang/impl.py", line 388, in create_program
    self.prog = _ti_core.Program()
RuntimeError: [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected while calling init (cuInit)
Taichi CUDA test failed: Command '['/home/serein/anaconda3/envs/genesis/bin/python', '-c', 'import taichi as ti; ti.init(arch=ti.cuda)']' returned non-zero exit status 1.

LongHZ140516 avatar Dec 21 '24 07:12 LongHZ140516

Maybe you can try to update your OpenGL or reinstall CUDA in 12.6 version.

CosmosMount avatar Dec 21 '24 13:12 CosmosMount

I encountered the same issue where the error message "no CUDA-capable device is detected" was displayed. In my case, adding the following line to the ~/.bashrc file resolved the problem:

export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

yasuhiroinoue avatar Dec 21 '24 14:12 yasuhiroinoue

I encountered the same issue where the error message "no CUDA-capable device is detected" was displayed. In my case, adding the following line to the ~/.bashrc file resolved the problem:我遇到了同样的问题,显示错误消息“未检测到支持 CUDA 的设备”。就我而言,将以下行添加到~/.bashrc文件解决了问题:

export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

Thank you very much for your help, this worked for me. However, I have a new problem OpenGL.error.Error: Attempt to retrieve context when no valid context. I think I will try to solve it.

LongHZ140516 avatar Dec 22 '24 06:12 LongHZ140516

Maybe you can try to update your OpenGL or reinstall CUDA in 12.6 version.也许您可以尝试更新 OpenGL 或重新安装 12.6 版本的 CUDA。

Thanks for your answer, but it seems that the problem is still not solved after I updated cuda to 12.6. But I solved it according to yasuhiroinoue 's answer. Thank you very much for your reply.

LongHZ140516 avatar Dec 22 '24 06:12 LongHZ140516

I solved the problem of OpenGL.error.Error: Attempt to retrieve context when no valid context by following the answer from vhartman in issue #37 . Now everything works fine. Thanks a lot, I think my problem is solved.

LongHZ140516 avatar Dec 22 '24 07:12 LongHZ140516

export LD_LIBRARY_PATH=/usr/lib/wsl/lib is a workaround solution. The root issue is incorrect driver got installed. I posted the detail at https://stackoverflow.com/a/79357542/2000548

hongbo-miao avatar Jan 17 '25 06:01 hongbo-miao

export LD_LIBRARY_PATH=/usr/lib/wsl/lib is a workaround solution. The root issue is incorrect driver got installed. I posted the detail at https://stackoverflow.com/a/79357542/2000548

Thank you very much for your reply, I will go back and try the solution you gave. Thank you for your help.

LongHZ140516 avatar Jan 17 '25 07:01 LongHZ140516