WSL icon indicating copy to clipboard operation
WSL copied to clipboard

CUDA won't detect GPU in WSL

Open peroroch opened this issue 3 years ago • 12 comments

I tried setting up Pytorch with CUDA in WSL but it just doesn't pick up my GPU. torch.cuda.is_available() returns False.

#include <cuda.h>
#include <stdio.h>

int main(int argc, char** argv) {
      int driver_version = 0, runtime_version = 0;

      cudaDriverGetVersion(&driver_version);
      cudaRuntimeGetVersion(&runtime_version);

      printf("Driver Version: %d\n"
             "Runtime Version: %d\n",
             driver_version, runtime_version);

      return 0;
}

This code, compiled with nvcc, just returns

Driver Version: 0
Runtime Version: 0

I'm on Microsoft Windows [Version 10.0.19044.2251]

And this is what torch.utils.collect_env outputs:

PyTorch version: 1.10.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.9.15 (main, Nov 24 2022, 14:31:59)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
Is CUDA available: False
CUDA runtime version: 11.1.105
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] torch==1.10.0+cu111
[pip3] torchaudio==0.10.0+rocm4.1
[pip3] torchvision==0.11.0+cu111
[conda] numpy                     1.23.5                   pypi_0    pypi
[conda] torch                     1.10.0+cu111             pypi_0    pypi
[conda] torchaudio                0.10.0+rocm4.1           pypi_0    pypi
[conda] torchvision               0.11.0+cu111             pypi_0    pypi

This is nvidia-smi ran on the Windows host.

Sat Dec 03 00:22:29 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 457.51       Driver Version: 457.51       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1060   WDDM  | 00000000:01:00.0  On |                  N/A |
| N/A   65C    P5     8W /  N/A |    750MiB /  6144MiB |     28%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

I've tried what this person suggested doing, which is to install everything through conda in a new environment in a new WSL2, but it didn't work.

From this thread, there supposedly should be an nvidia-smi binary in /usl/lib/wsl/lib/ but, on every single one of my WSL2 instance, there's only . .. libd3d12.so libd3d12core.so libdxcore.so there.

Does anyone have any ideas how I might get this working? Thanks

peroroch avatar Dec 02 '22 15:12 peroroch

@peroroch you'll need a Windows Nvidia GPU driver with WSL2 support. On Windows check under C:\Windows\System32\lxss\lib if the CUDA libs are missing:

PS C:\Windows\System32\lxss\lib> ls

    Directory: C:\Windows\System32\lxss\lib

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---          11/23/2022  1:11 AM         149912 libcuda.so
-a---          11/23/2022  1:11 AM         149912 libcuda.so.1
-a---          11/23/2022  1:11 AM         149912 libcuda.so.1.1
-a---           9/15/2021  7:33 AM         828840 libd3d12.so
-a---           9/15/2021  7:33 AM        4834848 libd3d12core.so
-a---           9/15/2021  7:33 AM         878768 libdxcore.so
-a---          11/23/2022  1:11 AM        8989896 libnvcuvid.so
-a---          11/23/2022  1:11 AM        8989896 libnvcuvid.so.1
-a---          11/23/2022  1:11 AM       14551728 libnvdxdlkernels.so
-a---          11/23/2022  1:11 AM         514664 libnvidia-encode.so
-a---          11/23/2022  1:11 AM         514664 libnvidia-encode.so.1
-a---          11/23/2022  1:11 AM         222944 libnvidia-ml.so.1
-a---          11/23/2022  1:11 AM         358864 libnvidia-opticalflow.so
-a---          11/23/2022  1:11 AM         358864 libnvidia-opticalflow.so.1
-a---          11/23/2022  1:11 AM          68560 libnvoptix.so.1
-a---          11/23/2022  1:11 AM       60186056 libnvwgf2umx.so
-a---          11/23/2022  1:11 AM         630224 nvidia-smi

elsaco avatar Dec 02 '22 17:12 elsaco

@elsaco Right, my driver version is 457.51. CUDA 11.1 supported WSL2 so I just assumed that this driver also supported it, is it not?

And all the libs are missing, except for libd3d12.so libd3d12core.so libdxcore.so.

    Directory: C:\Windows\System32\lxss\lib


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----        05/10/2022     16:20         828840 libd3d12.so
-a----        05/10/2022     16:20        4834848 libd3d12core.so
-a----        05/10/2022     16:20         878768 libdxcore.so

peroroch avatar Dec 03 '22 01:12 peroroch

I however on the other hand, can never get nvidia-smi to work under WSL. It seems only Windows 10 worked briefly once.

This is on host machine.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 526.98       Driver Version: 526.98       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE... TCC   | 00000000:04:00.0 Off |                    0 |
| N/A   35C    P0    25W / 250W |      8MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ... WDDM  | 00000000:81:00.0  On |                  N/A |
| 33%   40C    P8     9W / 250W |    971MiB / 11264MiB |      7%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

This is nvcc -V on wsl ubuntu 22.04:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

but nvidia-smi returns: Failed to initialize NVML: Unknown Error

and I have everything under /usr/lib/wsl/lib

total 96644
drwxr-xr-x 1 root root       60 Dec  5 09:37 ./
drwxr-xr-x 4 root root     4096 Dec  1 19:23 ../
-r-xr-xr-x 1 root root   149912 Nov 13 20:23 libcuda.so*
-r-xr-xr-x 1 root root   149912 Nov 13 20:23 libcuda.so.1*
-r-xr-xr-x 1 root root   149912 Nov 13 20:23 libcuda.so.1.1*
-r-xr-xr-x 1 root root   800568 Dec  1 11:11 libd3d12.so*
-r-xr-xr-x 1 root root  6224608 Dec  1 11:11 libd3d12core.so*
-r-xr-xr-x 1 root root   829248 Dec  1 11:11 libdxcore.so*
-r-xr-xr-x 1 root root  8989896 Nov 13 20:23 libnvcuvid.so*
-r-xr-xr-x 1 root root  8989896 Nov 13 20:23 libnvcuvid.so.1*
-r-xr-xr-x 1 root root 14551728 Nov 13 20:23 libnvdxdlkernels.so*
-r-xr-xr-x 1 root root   514664 Nov 13 20:23 libnvidia-encode.so*
-r-xr-xr-x 1 root root   514664 Nov 13 20:23 libnvidia-encode.so.1*
-r-xr-xr-x 1 root root   222944 Nov 13 20:23 libnvidia-ml.so.1*
-r-xr-xr-x 1 root root   358864 Nov 13 20:23 libnvidia-opticalflow.so*
-r-xr-xr-x 1 root root   358864 Nov 13 20:23 libnvidia-opticalflow.so.1*
-r-xr-xr-x 1 root root    68560 Nov 13 20:23 libnvoptix.so.1*
lrwxrwxrwx 1 root root       15 Dec  5 09:37 libnvoptix_loader.so.1 -> libnvoptix.so.1*
-r-xr-xr-x 1 root root 60186056 Nov 13 20:23 libnvwgf2umx.so*
-r-xr-xr-x 1 root root   630224 Nov 13 20:23 nvidia-smi*

fzhan avatar Dec 05 '22 09:12 fzhan

My cuda.is_available returns true, but NVML is not found:

Python 3.10.6 (main, Nov  2 2022, 18:53:38) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import numpy
>>> a = torch.rand((1,1)).cuda()
/home/xxx/.local/lib/python3.10/site-packages/torch/cuda/__init__.py:497: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
>>>
>>>
>>> torch.cuda.is_available()
True

fzhan avatar Dec 05 '22 11:12 fzhan

@peroroch You're on Win10 with build number 19044, in order to get CUDA support for WSL2, your build number needs to be at least 20145.

You may check my notes for reference. I've successfully installed CUDA on WSL2.

Ayke avatar Jan 20 '23 07:01 Ayke

@fzhan Most likely you've accidentally installed CUDA toolkit for ubuntu. When you're calling nvcc you're not calling the WSLUbuntu version of nvcc (which eventually uses your Windows CUDA under /usr/lib/wsl/lib). Instead it's trying to find a local cuda module in ubuntu.

Ayke avatar Jan 20 '23 07:01 Ayke

@fzhan Most likely you've accidentally installed CUDA toolkit for ubuntu. When you're calling nvcc you're not calling the WSLUbuntu version of nvcc (which eventually uses your Windows CUDA under /usr/lib/wsl/lib). Instead it's trying to find a local cuda module in ubuntu.

Unlikely. I've followed the instructions from top to bottom several times with fresh installation of both WSL and Ubuntu. Is there anyway to "debug" this?

fzhan avatar Feb 21 '23 08:02 fzhan

I run into the same thing? Has anyone found a solution?

jluyt123 avatar Dec 16 '23 19:12 jluyt123

@Ayke do you know how to fix this? or get a fresh install going?

sunwooz avatar Feb 24 '24 17:02 sunwooz

I got this solved by install windows CUDA toolkit. I thought that was installed by default with Nvidia driver but actually not. I realised I don't have libnvcuvid.so in my windows lib folder https://developer.nvidia.com/cuda-11.0-download-archive?target_os=Windows&target_arch=x86_64

steven-shi avatar Apr 01 '24 21:04 steven-shi

@fzhan Most likely you've accidentally installed CUDA toolkit for ubuntu. When you're calling nvcc you're not calling the WSLUbuntu version of nvcc (which eventually uses your Windows CUDA under /usr/lib/wsl/lib). Instead it's trying to find a local cuda module in ubuntu.

@Ayke I may have done this by accident. Is there a way to confirm if I have WSL or wrong version? And delete it?

cyberneel avatar Mar 29 '25 03:03 cyberneel

I got this solved by install windows CUDA toolkit. I thought that was installed by default with Nvidia driver but actually not. I realised I don't have libnvcuvid.so in my windows lib folder developer.nvidia.com/cuda-11.0-download-archive?target_os=Windows&target_arch=x86_64

Nope tried it! not working. unless 12.9 toolkit is too recent for torche.

python3 -c "import torch; print(torch.version.cuda); print(torch.cuda.is_available())"
12.1
False

brokedba avatar May 29 '25 21:05 brokedba