CUDA won't detect GPU in WSL
I tried setting up Pytorch with CUDA in WSL but it just doesn't pick up my GPU. torch.cuda.is_available() returns False.
#include <cuda.h>
#include <stdio.h>
int main(int argc, char** argv) {
int driver_version = 0, runtime_version = 0;
cudaDriverGetVersion(&driver_version);
cudaRuntimeGetVersion(&runtime_version);
printf("Driver Version: %d\n"
"Runtime Version: %d\n",
driver_version, runtime_version);
return 0;
}
This code, compiled with nvcc, just returns
Driver Version: 0
Runtime Version: 0
I'm on Microsoft Windows [Version 10.0.19044.2251]
And this is what torch.utils.collect_env outputs:
PyTorch version: 1.10.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31
Python version: 3.9.15 (main, Nov 24 2022, 14:31:59) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
Is CUDA available: False
CUDA runtime version: 11.1.105
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] torch==1.10.0+cu111
[pip3] torchaudio==0.10.0+rocm4.1
[pip3] torchvision==0.11.0+cu111
[conda] numpy 1.23.5 pypi_0 pypi
[conda] torch 1.10.0+cu111 pypi_0 pypi
[conda] torchaudio 0.10.0+rocm4.1 pypi_0 pypi
[conda] torchvision 0.11.0+cu111 pypi_0 pypi
This is nvidia-smi ran on the Windows host.
Sat Dec 03 00:22:29 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 457.51 Driver Version: 457.51 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1060 WDDM | 00000000:01:00.0 On | N/A |
| N/A 65C P5 8W / N/A | 750MiB / 6144MiB | 28% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
I've tried what this person suggested doing, which is to install everything through conda in a new environment in a new WSL2, but it didn't work.
From this thread, there supposedly should be an nvidia-smi binary in /usl/lib/wsl/lib/ but, on every single one of my WSL2 instance, there's only . .. libd3d12.so libd3d12core.so libdxcore.so there.
Does anyone have any ideas how I might get this working? Thanks
@peroroch you'll need a Windows Nvidia GPU driver with WSL2 support. On Windows check under C:\Windows\System32\lxss\lib if the CUDA libs are missing:
PS C:\Windows\System32\lxss\lib> ls
Directory: C:\Windows\System32\lxss\lib
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 11/23/2022 1:11 AM 149912 libcuda.so
-a--- 11/23/2022 1:11 AM 149912 libcuda.so.1
-a--- 11/23/2022 1:11 AM 149912 libcuda.so.1.1
-a--- 9/15/2021 7:33 AM 828840 libd3d12.so
-a--- 9/15/2021 7:33 AM 4834848 libd3d12core.so
-a--- 9/15/2021 7:33 AM 878768 libdxcore.so
-a--- 11/23/2022 1:11 AM 8989896 libnvcuvid.so
-a--- 11/23/2022 1:11 AM 8989896 libnvcuvid.so.1
-a--- 11/23/2022 1:11 AM 14551728 libnvdxdlkernels.so
-a--- 11/23/2022 1:11 AM 514664 libnvidia-encode.so
-a--- 11/23/2022 1:11 AM 514664 libnvidia-encode.so.1
-a--- 11/23/2022 1:11 AM 222944 libnvidia-ml.so.1
-a--- 11/23/2022 1:11 AM 358864 libnvidia-opticalflow.so
-a--- 11/23/2022 1:11 AM 358864 libnvidia-opticalflow.so.1
-a--- 11/23/2022 1:11 AM 68560 libnvoptix.so.1
-a--- 11/23/2022 1:11 AM 60186056 libnvwgf2umx.so
-a--- 11/23/2022 1:11 AM 630224 nvidia-smi
@elsaco Right, my driver version is 457.51. CUDA 11.1 supported WSL2 so I just assumed that this driver also supported it, is it not?
And all the libs are missing, except for libd3d12.so libd3d12core.so libdxcore.so.
Directory: C:\Windows\System32\lxss\lib
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 05/10/2022 16:20 828840 libd3d12.so
-a---- 05/10/2022 16:20 4834848 libd3d12core.so
-a---- 05/10/2022 16:20 878768 libdxcore.so
I however on the other hand, can never get nvidia-smi to work under WSL. It seems only Windows 10 worked briefly once.
This is on host machine.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 526.98 Driver Version: 526.98 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... TCC | 00000000:04:00.0 Off | 0 |
| N/A 35C P0 25W / 250W | 8MiB / 16384MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... WDDM | 00000000:81:00.0 On | N/A |
| 33% 40C P8 9W / 250W | 971MiB / 11264MiB | 7% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
This is nvcc -V on wsl ubuntu 22.04:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
but nvidia-smi returns:
Failed to initialize NVML: Unknown Error
and I have everything under /usr/lib/wsl/lib
total 96644
drwxr-xr-x 1 root root 60 Dec 5 09:37 ./
drwxr-xr-x 4 root root 4096 Dec 1 19:23 ../
-r-xr-xr-x 1 root root 149912 Nov 13 20:23 libcuda.so*
-r-xr-xr-x 1 root root 149912 Nov 13 20:23 libcuda.so.1*
-r-xr-xr-x 1 root root 149912 Nov 13 20:23 libcuda.so.1.1*
-r-xr-xr-x 1 root root 800568 Dec 1 11:11 libd3d12.so*
-r-xr-xr-x 1 root root 6224608 Dec 1 11:11 libd3d12core.so*
-r-xr-xr-x 1 root root 829248 Dec 1 11:11 libdxcore.so*
-r-xr-xr-x 1 root root 8989896 Nov 13 20:23 libnvcuvid.so*
-r-xr-xr-x 1 root root 8989896 Nov 13 20:23 libnvcuvid.so.1*
-r-xr-xr-x 1 root root 14551728 Nov 13 20:23 libnvdxdlkernels.so*
-r-xr-xr-x 1 root root 514664 Nov 13 20:23 libnvidia-encode.so*
-r-xr-xr-x 1 root root 514664 Nov 13 20:23 libnvidia-encode.so.1*
-r-xr-xr-x 1 root root 222944 Nov 13 20:23 libnvidia-ml.so.1*
-r-xr-xr-x 1 root root 358864 Nov 13 20:23 libnvidia-opticalflow.so*
-r-xr-xr-x 1 root root 358864 Nov 13 20:23 libnvidia-opticalflow.so.1*
-r-xr-xr-x 1 root root 68560 Nov 13 20:23 libnvoptix.so.1*
lrwxrwxrwx 1 root root 15 Dec 5 09:37 libnvoptix_loader.so.1 -> libnvoptix.so.1*
-r-xr-xr-x 1 root root 60186056 Nov 13 20:23 libnvwgf2umx.so*
-r-xr-xr-x 1 root root 630224 Nov 13 20:23 nvidia-smi*
My cuda.is_available returns true, but NVML is not found:
Python 3.10.6 (main, Nov 2 2022, 18:53:38) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import numpy
>>> a = torch.rand((1,1)).cuda()
/home/xxx/.local/lib/python3.10/site-packages/torch/cuda/__init__.py:497: UserWarning: Can't initialize NVML
warnings.warn("Can't initialize NVML")
>>>
>>>
>>> torch.cuda.is_available()
True
@peroroch You're on Win10 with build number 19044, in order to get CUDA support for WSL2, your build number needs to be at least 20145.
You may check my notes for reference. I've successfully installed CUDA on WSL2.
@fzhan Most likely you've accidentally installed CUDA toolkit for ubuntu. When you're calling nvcc you're not calling the WSLUbuntu version of nvcc (which eventually uses your Windows CUDA under /usr/lib/wsl/lib). Instead it's trying to find a local cuda module in ubuntu.
@fzhan Most likely you've accidentally installed CUDA toolkit for ubuntu. When you're calling nvcc you're not calling the WSLUbuntu version of nvcc (which eventually uses your Windows CUDA under
/usr/lib/wsl/lib). Instead it's trying to find a local cuda module in ubuntu.
Unlikely. I've followed the instructions from top to bottom several times with fresh installation of both WSL and Ubuntu. Is there anyway to "debug" this?
I run into the same thing? Has anyone found a solution?
@Ayke do you know how to fix this? or get a fresh install going?
I got this solved by install windows CUDA toolkit. I thought that was installed by default with Nvidia driver but actually not. I realised I don't have libnvcuvid.so in my windows lib folder https://developer.nvidia.com/cuda-11.0-download-archive?target_os=Windows&target_arch=x86_64
@fzhan Most likely you've accidentally installed CUDA toolkit for ubuntu. When you're calling nvcc you're not calling the WSLUbuntu version of nvcc (which eventually uses your Windows CUDA under
/usr/lib/wsl/lib). Instead it's trying to find a local cuda module in ubuntu.
@Ayke I may have done this by accident. Is there a way to confirm if I have WSL or wrong version? And delete it?
I got this solved by install windows CUDA toolkit. I thought that was installed by default with Nvidia driver but actually not. I realised I don't have libnvcuvid.so in my windows lib folder developer.nvidia.com/cuda-11.0-download-archive?target_os=Windows&target_arch=x86_64
Nope tried it! not working. unless 12.9 toolkit is too recent for torche.
python3 -c "import torch; print(torch.version.cuda); print(torch.cuda.is_available())"
12.1
False