WSLg/Cuda suddenly broken due to nvidia-smi unable to find GPU
Version
10.0.22000.1098
WSL Version
- [X] WSL 2
- ~~WSL 1~~
Kernel Version
5.15.68.1
Distro Version
Ubuntu 22.04
Other Software
WSL version: 0.70.5.0 WSLg version: 1.0.45 Direct3D version: 1.606.4 DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Nvidia Driver: 526.47, Game Ready Driver, released 10/27/2022
Repro Steps
- Open wsl terminal
- Execute command
nvidia-smi
Expected Behavior
The nvidia-smi utility dumps diagnostic details about the GPU.
nvidia-smi.exe on Windows is able to display the expected output:
┖[~]> nvidia-smi
Tue Nov 1 10:18:07 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 526.47 Driver Version: 526.47 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... WDDM | 00000000:0B:00.0 Off | N/A |
| 0% 37C P8 17W / 350W | 1619MiB / 12288MiB | 8% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Actual Behavior
nvidia-smi on wsl/ubuntu 22.04 outputs a generic error instead:
dattebayo@<NGP'd>:~/dev$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Failed to properly shut down NVML: Driver Not Loaded
Diagnostic Logs
I'll admit I'm kinda dumb when it comes to doing the linux diagnostics, which is part of what brought me here. Here's what I've been able to gather from various Googlings and such though:
dpkg -l | grep nvidia
ii libnvidia-compute-495:amd64 510.85.02-0ubuntu0.22.04.1 amd64 Transitional package for libnvidia-compute-510
ii libnvidia-compute-510:amd64 510.85.02-0ubuntu0.22.04.1 amd64 NVIDIA libcompute package
rc libnvidia-compute-520:amd64 520.56.06-0ubuntu0.20.04.1 amd64 NVIDIA libcompute package
ii libnvidia-ml-dev:amd64 11.5.50~11.5.1-1ubuntu1 amd64 NVIDIA Management Library (NVML) development files
ii nvidia-cuda-dev:amd64 11.5.1-1ubuntu1 amd64 NVIDIA CUDA development files
ii nvidia-cuda-gdb 11.5.114~11.5.1-1ubuntu1 amd64 NVIDIA CUDA Debugger (GDB)
rc nvidia-cuda-toolkit 11.5.1-1ubuntu1 amd64 NVIDIA CUDA development toolkit
ii nvidia-cuda-toolkit-doc 11.5.1-1ubuntu1 all NVIDIA CUDA and OpenCL documentation
ii nvidia-opencl-dev:amd64 11.5.1-1ubuntu1 amd64 NVIDIA OpenCL development files
ii nvidia-profiler 11.5.114~11.5.1-1ubuntu1 amd64 NVIDIA Profiler for CUDA and OpenCL
ii nvidia-visual-profiler 11.5.114~11.5.1-1ubuntu1 amd64 NVIDIA Visual Profiler for CUDA and OpenCL
lsmod | grep nvidia
No output
(Truncated) DxDiag Output
------------------
System Information
------------------
Time of this report: 11/1/2022, 10:23:47
Machine name: NGP'd
Machine Id: {534C5435-EB65-464A-801F-79979E08B34E}
Operating System: Windows 11 Pro 64-bit (10.0, Build 22000) (22000.co_release.210604-1628)
Language: English (Regional Setting: English)
System Manufacturer: ASUS
System Model: System Product Name
BIOS: 3601 (type: UEFI)
Processor: AMD Ryzen 9 5950X 16-Core Processor (32 CPUs), ~3.4GHz
Memory: 131072MB RAM
Available OS Memory: 130980MB RAM
Page File: 53817MB used, 96617MB available
Windows Dir: C:\Windows
DirectX Version: DirectX 12
DX Setup Parameters: Not found
User DPI Setting: 144 DPI (150 percent)
System DPI Setting: 96 DPI (100 percent)
DWM DPI Scaling: Disabled
Miracast: Available, no HDCP
Microsoft Graphics Hybrid: Not Supported
DirectX Database Version: 1.2.2
DxDiag Version: 10.00.22000.0653 64bit Unicode
...
---------------
Display Devices
---------------
Card name: NVIDIA GeForce RTX 3080 Ti
Manufacturer: NVIDIA
Chip type: NVIDIA GeForce RTX 3080 Ti
DAC type: Integrated RAMDAC
Device Type: Full Device (POST)
Device Key: Enum\PCI\VEN_10DE&DEV_2208&SUBSYS_261219DA&REV_A1
Device Status: 0180200A [DN_DRIVER_LOADED|DN_STARTED|DN_DISABLEABLE|DN_NT_ENUMERATOR|DN_NT_DRIVER]
Device Problem Code: No Problem
Driver Problem Code: Unknown
Display Memory: Unknown
Dedicated Memory: n/a
Shared Memory: n/a
Current Mode: Unknown
HDR Support: Unknown
Display Topology: Unknown
Display Color Space: Unknown
Color Primaries: Unknown
Display Luminance: Unknown
Driver Name: C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_ade64cd54ec2f9ed\nvldumdx.dll,C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_ade64cd54ec2f9ed\nvldumdx.dll,C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_ade64cd54ec2f9ed\nvldumdx.dll,C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_ade64cd54ec2f9ed\nvldumdx.dll
Driver File Version: 31.00.0015.2647 (English)
Driver Version: 31.0.15.2647
DDI Version: unknown
Feature Levels: Unknown
Driver Model: WDDM 3.0
Hardware Scheduling: DriverSupportState:Stable Enabled:True
Graphics Preemption: Pixel
Compute Preemption: Dispatch
Miracast: Not Supported by Graphics driver
Detachable GPU: No
Hybrid Graphics GPU: Discrete
Power P-states: Not Supported
Virtualization: Paravirtualization
Block List: No Blocks
Catalog Attributes: Universal:False Declarative:True
Driver Attributes: Final Retail
Driver Date/Size: 10/24/2022 5:00:00 PM, 772488 bytes
WHQL Logo'd: Yes
WHQL Date Stamp: Unknown
Device Identifier: Unknown
Vendor ID: 0x10DE
Device ID: 0x2208
SubSys ID: 0x261219DA
Revision ID: 0x00A1
Driver Strong Name: oem52.inf:0f066de3b91c4385:Section071:31.0.15.2647:pci\ven_10de&dev_2208
Rank Of Driver: 00CF2001
Video Accel: Unknown
DXVA2 Modes: Unknown
Deinterlace Caps: n/a
D3D9 Overlay: Unknown
DXVA-HD: Unknown
DDraw Status: Enabled
D3D Status: Not Available
AGP Status: Enabled
MPO MaxPlanes: Unknown
MPO Caps: Unknown
MPO Stretch: Unknown
MPO Media Hints: Unknown
MPO Formats: Unknown
PanelFitter Caps: Unknown
PanelFitter Stretch: Unknown
....
Taking some shots in the dark here (mainly because I'm really motivated to fix this 😅)
Looking at dmesg trace pops something potentially interesting? Idk if these ioctls are expected to fail...
[ 3.894321] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 3.894829] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 3.895274] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 3.895634] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
These messages come immediately after some BAR assignment operations and a log warning about libcuda not being a symlink.
Something else I noticed is that dmesg goes quiet for a real long time, and then later there's more spew from dxg:
[ 49.226483] hv_balloon: Max. dynamic memory size: 65488 MB
[ 3305.059848] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 3305.060250] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 3305.060464] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 3305.060744] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[ 3489.978602] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 3489.979056] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 3489.979465] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 3489.979955] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[ 3573.200318] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 3573.200674] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 3573.200914] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 3573.201269] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[ 3593.582798] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 3593.583127] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 3593.583354] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 3593.583633] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
<EOF>
It looks like my issue might be related (but not same failure mode?) as #8937 possibly?
Taking some shots in the dark here (mainly because I'm really motivated to fix this 😅)
Looking at
dmesgtrace pops something potentially interesting? Idk if these ioctls are expected to fail...[ 3.894321] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22 [ 3.894829] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22 [ 3.895274] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22 [ 3.895634] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2These messages come immediately after some BAR assignment operations and a log warning about libcuda not being a symlink.
Something else I noticed is that
dmesggoes quiet for a real long time, and then later there's more spew from dxg:[ 49.226483] hv_balloon: Max. dynamic memory size: 65488 MB [ 3305.059848] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22 [ 3305.060250] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22 [ 3305.060464] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22 [ 3305.060744] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2 [ 3489.978602] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22 [ 3489.979056] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22 [ 3489.979465] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22 [ 3489.979955] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2 [ 3573.200318] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22 [ 3573.200674] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22 [ 3573.200914] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22 [ 3573.201269] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2 [ 3593.582798] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22 [ 3593.583127] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22 [ 3593.583354] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22 [ 3593.583633] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2 <EOF>It looks like my issue might be related (but not same failure mode?) as #8937 possibly?
The error messages are most likely benign. How was the nvidia-smi utility installed? I installed it using "sudo apt install nvidia-utils-520" and it works for me just fine with the same host driver version. Iouri
@devttebayo it is not sound practice to add any nvidia-utils packages or drivers on WSL side. The nvidia-smi located at /usr/lib/wsl/lib/nvidia-smi is exported from the Windows side, being part of the Windows Nvidia driver. It is located at c:\windows\System32\lxss\lib\ along with some Nvidia libraries used by WSL.
@elsaco Thanks for explaining that, I should have known that it wouldn't be wise to add utils intended for native hw to a virtualized guest. Being honest, I can't remember at which point I installed them (or if it was a side effect of careless debug copy+paste...)
More embarrassing, it appears I somehow I lost the repro? I'm not exactly sure how though, seeing as I rebooted both WSL and my host PC a few times prior to opening this issue. Ah well, thanks for the helpful pointers! I think I'm going to go ahead and close this out for now. Sorry for the noise!
Reopening this because it looks like I hit a repro again.
Currently in a state where WSL is unable to detect my GPU and running a wsl.exe -d Ubuntu --shutdown didn't resolve the issue. I verified that I don't have the nvidia-utils package installed either.
Going to hope my PC doesn't reboot and lose the repro in case someone has ideas of next steps I could take to investigate.
I am encountering this issue as well. I start WSL via the Task Scheduler on login, and nvidia-smi reports an error connecting with the driver. If I manually shutdown WSL and restart it, then nvidia-smi successfully contacts the driver and all works fine. It seems that something is broken on the first WSL launch.
I have the same issue as described in #9134, so you are not alone.
I haven't installed any external nvidia libraries in WSL either and I can run nvidia-smi.exe in WSL successfully but running the nvidia-smi located in /usr/lib/wsl/lib/nvidia-smi and /mnt/c/Windows/System32/lxss/lib/nvidia-smi both produce the same error as you have above.
Manually shutting down and restarting doesn't seem to yield any results either.
So, I was able to get this to work on my end (possibly temporarily) after trying a few things. I'm not sure what exactly got it to work but I did the following.
- Roll-back nvidia driver to 522.06 and install CUDA 11.8 on windows (it was still failing after this)
- Install
nvidia-settingsin WSL (it was still failing after this) - wsl.exe -d Ubuntu --shutdown (tried this a couple times and it was still failing)
- wsl.exe -d Ubuntu --shutdown + wsl.exe --terminate Ubuntu (after this, it started working again)
I think the last step is what got it to work, let me know if you can reproduce it.
~~I fixed my problem by reinstalling my graphics driver.~~
Nevermind... Initially, after reinstalling the graphics driver and rebooting, there was no issue. After rebooting again, however, the issue reappears. nvidia-smi works on Windows but not on WSL. Manually shutting down WSL and restarting it fixes the problem.
Just updated to the lates 526.86 driver (released today) and ran the shutdown + terminate combo @anubhavashok called out above with no luck.
Was able to verify nvidia-smi in Windows is still working correctly.
So this is a strange development... I updated to WSL 0.70.8 and I'm now in a strange state where nvidia-smi works in some WSL windows but not others?
What I mean is:
- Launch 'Ubuntu on Windows' app from Start Menu, this loads a standalone terminal for the Ubuntu instance
- In this standalone terminal, run nvidia-smi. At this point I observe the expected output
- Launch my WSL Ubuntu on Windows terminal as a tab in an existing Windows Terminal instance
- Run nvidia-smi in the WSL Windows Terminal tab. Observe the NVML Driver load error??
What's super strange to me is I can have the two terminals open side by side and run nvidia-smi repeatedly with the same results in each terminal. I guess this is a workaround for me, but I have no idea why it works?
I meet same problem since Sep. and I can run cuda in docker in wsl2, but not in kali-linux I find that:
- If I start from Windows Terminal(Admin): nvidia-smi will fail like this
- If I start from Windows Terminal(without Admin): nvidia-smi sucess
@cq01 Just tried this to verify and that's 100% the difference in my setup above - my Windows Terminal always launches as Admin (and nvidia-smi fails 100% of the time)
Re-launching without Admin rights gets nvidia-smi working. At least I have a workaround I understand now :)
i have same problem i think my problem is related to wslg when i installed wslg and gedit is not working and i uninstall wslg is back to work
i am rookie i dont know why
environment: Win32NT 10.0.22621.0 Microsoft Windows NT 10.0.22621.0
WSL 版本: 0.70.4.0 内核版本: 5.15.68.1 WSLg 版本: 1.0.45 MSRDC 版本: 1.2.3575 Direct3D 版本: 1.606.4 DXCore 版本: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp Windows版本: 10.0.22621.819
wang@wang:~$ glxinfo -B name of display: :0 display: :0 screen: 0 direct rendering: Yes Extended renderer info (GLX_MESA_query_renderer): Vendor: Mesa/X.org (0xffffffff) Device: llvmpipe (LLVM 14.0.6, 256 bits) (0xffffffff) 《=========here Version: 22.2.3 Accelerated: no Video memory: 7873MB Unified memory: no Preferred profile: core (0x1) Max core profile version: 4.5 Max compat profile version: 4.5 Max GLES1 profile version: 1.1 Max GLES[23] profile version: 3.2 OpenGL vendor string: Mesa/X.org OpenGL renderer string: llvmpipe (LLVM 14.0.6, 256 bits) OpenGL core profile version string: 4.5 (Core Profile) Mesa 22.2.3 - kisak-mesa PPA OpenGL core profile shading language version string: 4.50 OpenGL core profile context flags: (none) OpenGL core profile profile mask: core profile
OpenGL version string: 4.5 (Compatibility Profile) Mesa 22.2.3 - kisak-mesa PPA OpenGL shading language version string: 4.50 OpenGL context flags: (none) OpenGL profile mask: compatibility profile
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 22.2.3 - kisak-mesa PPA OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
I meet same problem since Sep. and I can run cuda in docker in wsl2, but not in kali-linux I find that:
- If I start from Windows Terminal(Admin): nvidia-smi will fail like this
- If I start from Windows Terminal(without Admin): nvidia-smi sucess
谢谢兄弟 这玩意搞了我一个晚上 就离谱 头痛
In my cases, the nvidia-smi only worked when exec from Windows Terminal as Admin.
This is the same behavior I am observing as well. In addition, when running wsl from an elevated windows terminal session, then I have to run sudo when running a gpu test sudo docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark or I will get errors. When running wsl from non-elevated window terminal session I no longer need sudo any more to utilize --gpus. Would love to know why this is.
So, I was able to get this to work on my end (possibly temporarily) after trying a few things. I'm not sure what exactly got it to work but I did the following.
- Roll-back nvidia driver to 522.06 and install CUDA 11.8 on windows (it was still failing after this)
- Install
nvidia-settingsin WSL (it was still failing after this)- wsl.exe -d Ubuntu --shutdown (tried this a couple times and it was still failing)
- wsl.exe -d Ubuntu --shutdown + wsl.exe --terminate Ubuntu (after this, it started working again)
I think the last step is what got it to work, let me know if you can reproduce it.
Just tried reinstalled 522.06, and CUDA 11.8, then did all the shutdown and terminate, still produce
Failed to initialize NVML: Unknown Error
Just trying to link all the relevant threads here:
https://github.com/canonical/microk8s/issues/3024 https://github.com/microsoft/WSL/issues/9254 https://github.com/microsoft/WSL/issues/8174 https://github.com/microsoft/WSL/issues/9134
This cannot be a coincident.
Just trying to link all the relevant threads here:
canonical/microk8s#3024 #9254 #8174 #9134
This cannot be a coincident.
@fzhan
nvidia-smi needs to be from the Windows driver package. It is mapped to /usr/lib/wsl/lib/nvidia-smi,
There is an issue when nvidia-smi and other Cuda applications are running from a WSL window, started as Administrator or not. For example, if you start WSL as Administrator the very first time after boot, nvidia-smi works. If you start another WSL as non-Admininstator, it fails. The opposite is also true, If you start WSL the very first time as non-Admininstator, nvidia -smi works. If you start another WSL window as Admininstator, nvidia -smi fails. This is under investigation. It might be related to your case.
Just trying to link all the relevant threads here: canonical/microk8s#3024 #9254 #8174 #9134 This cannot be a coincident.
@fzhan
nvidia-smi needs to be from the Windows driver package. It is mapped to /usr/lib/wsl/lib/nvidia-smi,
There is an issue when nvidia-smi and other Cuda applications are running from a WSL window, started as Administrator or not. For example, if you start WSL as Administrator the very first time after boot, nvidia-smi works. If you start another WSL as non-Admininstator, it fails. The opposite is also true, If you start WSL the very first time as non-Admininstator, nvidia -smi works. If you start another WSL window as Admininstator, nvidia -smi fails. This is under investigation. It might be related to your case.
I have nvidia-smi installed only on Windows side, and had WSL installed with Administrator, that said, the user is literally "Administrator" and is an "Administrator" level account. I have also tried to create a "non-Admin" account, redo the entire WSL under that account, it still fails.
I've noticed a couple of similarities in these issues:
- latest Windows 11 with WSL up-to-date.
- latest nvidia driver, or at least after 520.
- cuda 11.8 or 12 (being the latest)
- wsl-ubuntu with the instruction listed on nvidia website, but some are using it on Ubuntu-22.04 where the instruction has 20.04
I meet same problem since Sep. and I can run cuda in docker in wsl2, but not in kali-linux I find that:
- If I start from Windows Terminal(Admin): nvidia-smi will fail like this
- If I start from Windows Terminal(without Admin): nvidia-smi sucess
My situation is the same as yours, it's amazing!
I meet same problem since Sep. and I can run cuda in docker in wsl2, but not in kali-linux I find that:
- If I start from Windows Terminal(Admin): nvidia-smi will fail like this
- If I start from Windows Terminal(without Admin): nvidia-smi sucess
Thanks, Im using the Windows Terminal Preview, turn the admin mode off and restart the terminal, yeal it's ok
@CharlesSL may I know your version of Windows?
@fzhan win11 insider build 25267
same problem
I think this is an regression bug.
@CharlesSL cool thanks, I have issue with the latest Win 11, fresh installed not upgraded.
In my case nvidia-smi works fine
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.65 Driver Version: 527.56 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| N/A 53C P8 7W / 74W | 110MiB / 4096MiB | 7% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 29 G /Xwayland N/A |
+-----------------------------------------------------------------------------+
But pytorch cannot allocate GPU Memory:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.16 GiB (GPU 0; 4.00 GiB total capacity; 2.55 GiB already allocated; 0 bytes free; 2.59 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
and exactly in this moment these lines appears in dmesg :
[ 619.372029] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 1095.227415] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 1095.255181] misc dxg: dxgk: dxgkio_reserve_gpu_va: Ioctl failed: -75
[ 1100.182822] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 1105.528132] misc dxg: dxgk: dxgkio_make_resident: Ioctl failed: -12
[ 1105.658121] misc dxg: dxgk: dxgkio_make_resident: Ioctl failed: -12
[ 1105.747456] misc dxg: dxgk: dxgkio_make_resident: Ioctl failed: -12
[ 1105.835194] misc dxg: dxgk: dxgkio_make_resident: Ioctl failed: -12
The wslg problems started right after upgrading to Store version of wsl.
Run under administrator terminal: nvidia-smi NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Failed to properly shut down NVML: Driver Not Loaded
$ glxinfo -B name of display: :0 display: :0 screen: 0 direct rendering: Yes Extended renderer info (GLX_MESA_query_renderer): Vendor: Mesa/X.org (0xffffffff) Device: llvmpipe (LLVM 13.0.1, 256 bits) (0xffffffff) Version: 22.0.5 Accelerated: no Video memory: 31950MB Unified memory: no Preferred profile: core (0x1) Max core profile version: 4.5 Max compat profile version: 4.5 Max GLES1 profile version: 1.1 Max GLES[23] profile version: 3.2 OpenGL vendor string: Mesa/X.org OpenGL renderer string: llvmpipe (LLVM 13.0.1, 256 bits) OpenGL core profile version string: 4.5 (Core Profile) Mesa 22.0.5 OpenGL core profile shading language version string: 4.50 OpenGL core profile context flags: (none) OpenGL core profile profile mask: core profile
OpenGL version string: 4.5 (Compatibility Profile) Mesa 22.0.5 OpenGL shading language version string: 4.50 OpenGL context flags: (none) OpenGL profile mask: compatibility profile
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 22.0.5 OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
Running under Non Admin user, nvidia-smi runs but segfaults at end, and applications trying to use gpu after have various error and exit problems.
Output from Standard Terminal:
$ nvidia-smi Wed Jan 4 22:21:33 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.65 Driver Version: 527.56 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A | | N/A 42C P8 10W / 115W | 33MiB / 6144MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 23 G /Xwayland N/A | | 0 N/A N/A 23 G /Xwayland N/A | | 0 N/A N/A 26 G /Xwayland N/A | +-----------------------------------------------------------------------------+ $ glxinfo -B name of display: :0 display: :0 screen: 0 direct rendering: Yes Extended renderer info (GLX_MESA_query_renderer): Vendor: Microsoft Corporation (0xffffffff) Device: D3D12 (NVIDIA GeForce RTX 3060 Laptop GPU) (0xffffffff) Version: 22.0.5 Accelerated: yes Video memory: 38627MB Unified memory: no Preferred profile: core (0x1) Max core profile version: 3.3 Max compat profile version: 3.3 Max GLES1 profile version: 1.1 Max GLES[23] profile version: 3.1 OpenGL vendor string: Microsoft Corporation OpenGL renderer string: D3D12 (NVIDIA GeForce RTX 3060 Laptop GPU) OpenGL core profile version string: 3.3 (Core Profile) Mesa 22.0.5 OpenGL core profile shading language version string: 3.30 OpenGL core profile context flags: (none) OpenGL core profile profile mask: core profile
OpenGL version string: 3.3 (Compatibility Profile) Mesa 22.0.5 OpenGL shading language version string: 3.30 OpenGL context flags: (none) OpenGL profile mask: compatibility profile
OpenGL ES profile version string: OpenGL ES 3.1 Mesa 22.0.5 OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10
Segmentation fault
System Version Info: wsl --version WSL version: 1.0.3.0 Kernel version: 5.15.79.1 WSLg version: 1.0.47 MSRDC version: 1.2.3575 Direct3D version: 1.606.4 DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp Windows version: 10.0.22000.1335
All worked before install wsl store version.
similar problem also existed in snapd
https://github.com/ubuntu/WSL/issues/318