GPU access blocked by the operating system
Thanks for this script, I've been trying to get this to work since yesterday but things are not playing ball so far.
I've used both your version of the script and https://gist.github.com/Nislaco/ce7ec314bdf0cf519ff0fb2fffc55107 from https://github.com/seflerZ/oneclick-gpu-pv/issues/7
The dkms module compiles fine, dmesg | grep dx shows
[ 2.404359] dxgkrnl: loading out-of-tree module taints kernel.
[ 2.404372] dxgkrnl: module verification failed: signature and/or required key missing - tainting kernel
[ 2.410266] hv_vmbus: registering driver dxgkrnl
and nvidia-smi shows
Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system
The environment is running Server 2025 as the host with Nvidia Tesla P40 running the 539.19 grid version of the driver as the other drivers do not want to switch into WDDM mode.
VM is running Ubuntu 24.04 with kernel 6.8.0-52-generic, secure boot is disabled using Nislaco's script.
Second VM is running Ubuntu 22.04 with kernel 5.15.0-130-generic, secure boot is disabled and the dkms module compiled using your archive.
Set the below options on the VMs as well:
-CheckpointType Disabled -LowMemoryMappedIoSpace 3GB -HighMemoryMappedIoSpace 32GB -GuestControlledCacheTypes $true -AutomaticStopAction ShutDown
Host's output of nvidia-smi is
PS C:\Tools> nvidia-smi.exe
Tue Jan 28 13:15:31 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 539.19 Driver Version: 539.19 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla P40 WDDM | 00000000:84:00.0 Off | 0 |
| N/A 38C P0 47W / 250W | 391MiB / 23040MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 4276 C+G C:\Windows\System32\LogonUI.exe N/A |
| 0 N/A N/A 4284 C+G C:\Windows\System32\dwm.exe N/A |
| 0 N/A N/A 14380 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 14880 C+G C:\Windows\System32\dwm.exe N/A |
| 0 N/A N/A 20796 C+G ...__8wekyb3d8bbwe\WindowsTerminal.exe N/A |
| 0 N/A N/A 30472 C+G C:\Windows\System32\WUDFHost.exe N/A |
+---------------------------------------------------------------------------------------+
I compared this with a working WSL2 instance from another machine and dmesg | grep dx shows slightly different output
> dmesg | grep dx
[ 0.343507] hv_vmbus: registering driver dxgkrnl
[ 1.342853] misc dxg: dxgk: dxgkio_is_feature_enabled: Ioctl failed: -22
[ 1.354558] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 1.354973] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 1.355402] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[ 1.355907] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[ 2.339741] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[ 2.712171] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[ 2.712679] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[ 2.713168] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[ 2.713681] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[ 2.714260] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[ 2.714686] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[ 2.715257] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[ 2.715789] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[ 2.716249] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[ 2.716670] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[ 2.717074] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[ 2.820033] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
Do you have any thoughts on what I could be missing? I'm also suspecting this driver might be too old but this is the latest one from the Grid series that supports P40 to the best of my knowledge.
Paravirtualization works fine with my Windows 11 guest VM.
You would still need to move over the driver files from the host to guest. This should accommodate this: Run as a script from powershell as admin:
$username="username"
$ip="10.0.0.XX"
# Create a destination folder.
ssh ${username}@${ip} "mkdir -p ~/wsl/drivers; mkdir -p ~/wsl/lib;"
# Copy driver files
# https://github.com/brokeDude2901/dxgkrnl_ubuntu/blob/main/README.md#3-copy-windows-host-gpu-driver-to-ubuntu-vm
(Get-CimInstance -ClassName Win32_VideoController -Property *).InstalledDisplayDrivers | Select-String "C:\\Windows\\System32\\DriverStore\\FileRepository\\[a-zA-Z0-9\\._]+\\" | foreach {
$l=$_.Matches.Value.Substring(0, $_.Matches.Value.Length-1)
scp -r $l ${username}@${ip}:~/wsl/drivers/
}
scp -r C:\Windows\System32\lxss\lib ${username}@${ip}:~/wsl/
This part can be run in guest directly:
sudo mv ~/wsl /usr/lib/wsl;
sudo chmod -R 555 /usr/lib/wsl;
sudo chown -R root:root /usr/lib/wsl;
sudo sh -c 'echo "/usr/lib/wsl/lib" > /etc/ld.so.conf.d/ld.wsl.conf' ;
sudo ldconfig ;
sudo ln -s /usr/lib/wsl/lib/libd3d12core.so /usr/lib/wsl/lib/libD3D12Core.so ;
or as sudo:
mv ~/wsl /usr/lib/wsl
chmod -R 555 /usr/lib/wsl
chown -R root:root /usr/lib/wsl
sh -c 'echo "/usr/lib/wsl/lib" > /etc/ld.so.conf.d/ld.wsl.conf'
ldconfig
ln -s /usr/lib/wsl/lib/libd3d12core.so /usr/lib/wsl/lib/libD3D12Core.so
If you have multiple Nvidia GPU's you need to pass all through to a guest for Nvidia-smi to work correctly.
Apologies on my end as my gists don't mention this and just cover the dkms module part of the setup.
You would need to uninstall dkms module and remove/readd files between driver version changes.
Hey, thanks for getting back to me.
I forgot to add this in my initial issue, but of course, I copied the files over.
It did stumble me at first that libd3d12core.so is not there.
I read somewhere that WSL2 needs to be enabled on the host, perhaps that's why some of the files are missing?
This still does not make sense as on the local machine where WSL2 works well with GPU-PV, the C:\Windows\System32\lxss\lib directory does not have libd3d12core.so either, even though inside WSL, it is present.
You might want to check out https://github.com/staralt/dxgkrnl-dkms ; That is what my scripts were based on as well as info here.
With either, I used files from my windows hosts and copied them over to the Linux guest. I did not use files from WSL, but I do have WSL enabled.
I am only using Cuda from cli and have not setup a desktop environment.
Yeah, it's not working even with the custom kernel from https://github.com/brokeDude2901/dxgkrnl_ubuntu/blob/main/README.md#4-install-custom-dxgkrnl-kernel
I think it's a driver issue - I will try another driver on the host but I'll have to try figure out how to switch it to WDDM mode because I get error code 43 on the datacenter/studio driver when I follow those steps https://linustechtips.com/topic/1496913-can-i-enable-wddm-on-a-tesla-p40/.
Reproduced in a brand new Ubuntu 24.04 VM using steps from https://github.com/staralt/dxgkrnl-dkms on a different machine with working GPU-PV in WSL2 and it's not working in the VM either :(
I can confirm the same issues you are having on my end with various Nvidia cards on separate hosts using my scripts.
However, using seflerZ's scripts for an Ubuntu 22.04.1 Server running kernel 5.15.0-131-generic is working correctly.
https://gist.github.com/Nislaco/ce7ec314bdf0cf519ff0fb2fffc55107
ssh user@ip "sudo -S mkdir -p $(echo /usr/lib/wsl/drivers/)"
scp -r /usr/lib/wsl/lib user@ip\:~
scp -r /usr/lib/wsl/drivers user@ip\:~
ssh user@ip "sudo -S mv ~/lib/* /usr/lib;sudo -S ln -s /lib/libd3d12core.so /lib/libD3D12Core.so;sudo -S mv ~/drivers/* /usr/lib/wsl/drivers"
Making the module and moving files over with the same method as SeflerZ's ubuntu script from an ubuntu WSL instance worked correctly.
Got it working now as well on my workstation, both using your script and staralts dkms scripts. It even works on 22.04 and 24.04. I tried each script on each Ubuntu release, for testing's sake.
The difference is that I sourced the files from WSL2 this time, not from the host OS like most of the instructions are saying.
This gives me the extra missing files .so which are not present under C:\Windows\System32\lxss\lib, I suspect those are the files that make it work.
Files in WSL2
libcuda.so
libcuda.so.1
libcuda.so.1.1
libcudadebugger.so.1
libd3d12.so
libd3d12core.so
libdxcore.so
libnvcuvid.so
libnvcuvid.so.1
libnvdxdlkernels.so
libnvidia-encode.so
libnvidia-encode.so.1
libnvidia-ml.so.1
libnvidia-opticalflow.so
libnvidia-opticalflow.so.1
libnvoptix.so.1
libnvoptix_loader.so.1 -> libnvoptix.so.1
libnvwgf2umx.so
nvidia-smi
Files in lxss dir
libcuda.so
libcuda.so.1
libcuda.so.1.1
libcudadebugger.so.1
libnvcuvid.so
libnvcuvid.so.1
libnvdxdlkernels.so
libnvidia-encode.so
libnvidia-encode.so.1
libnvidia-ml.so.1
libnvidia-opticalflow.so
libnvidia-opticalflow.so.1
libnvoptix.so.1
libnvwgf2umx.so
nvidia-smi
I ran tar -czf drivers.tar /usr/lib/wsl/lib /usr/lib/wsl/drivers/nv_dispi.inf_amd64_adf5a840df867035/ on WSL2 and transferred the file to the VM. Extracted using tar -xvf drivers.tar -C /
From here there seem to be two ways to go... leave /usr/lib/wsl/lib in place and do
sh -c 'echo "/usr/lib/wsl/lib" > /etc/ld.so.conf.d/ld.wsl.conf'
ldconfig
sed -i '/^PATH=/ {/usr\/lib\/wsl\/lib/! s|"$|:/usr/lib/wsl/lib"|}' /etc/environment
source /etc/environment
or
mv /usr/lib/wsl/lib/* /usr/lib/
Not sure if ln -s /lib/libd3d12core.so /lib/libD3D12Core.so is needed or not. As of right now, things work without it.
Even works with Docker on Ubuntu 22.04
root@ubuntu-gpu-2204-906834886:~# sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
de44b265507a: Pull complete
Digest: sha256:80dd3c3b9c6cecb9f1667e9290b3bc61b78c2678c02cbdae5f0fea92cc6734ab
Status: Downloaded newer image for ubuntu:latest
Thu Jan 30 00:20:56 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.72 Driver Version: 566.14 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3090 On | 00000000:01:00.0 On | N/A |
| 0% 46C P8 43W / 390W | 1841MiB / 24576MiB | 13% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
root@ubuntu-gpu-2204-906834886:~#
Thanks for your help.
I guess the instructions for driver installation need to be updated to save others time.
Could you try reproducing that sourcing the driver files from lxss is the culprit on your side too?
Thanks for the update and patience! You are correct that it does come down to missing files and path issues. This did work previously with just files from LXSS but they might have relocated them at some point.
C:\Program Files\WSL\lib should have these files. Which is provided by wsl.exe --update
Sounds good, I've compiled all those steps into a PowerShell and Bash scripts that take care of copying the files, invoking the dkms compilation script, installing Docker and Nvidia Container Toolkit.
I just need to reboot my hyper-v host so I can reinstall the gpu driver and I'll test it all out there.
The scripts are below:
https://github.com/mateuszdrab/hyperv-vm-provisioning/blob/master/Copy-HostGPUDriverToUbuntu.ps1 https://github.com/mateuszdrab/hyperv-vm-provisioning/blob/master/install-gpu.sh
Glad to hear it is working again and thank you for bringing this up.
This did previously work without any issues other than an occasional Ioctl error in console and dmesg when running some workloads:
I can confirm this is working across 2 different hosts with 3 different NVidia cards, after adjusting paths to include LXSS and WSL\lib.
Oh no, on the Hypervisor I get:
root@ubuntu-2404-gpu-2125067429:~# nvidia-smi
Thu Jan 30 12:00:00 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 551.78 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
Segmentation fault (core dumped)
First, the driver on the host is one of the ones that needs manual switching from TCC mode to WDDM mode via registry. Will try the older Grid driver which works in WDDM straight away. Second, WSL isn't installed. I'll try installing it (I used the missing files from my other machine).
The above does work in a Windows VM though
Just as I thought, the Grid driver version 16.9 works fine but driver version 539.19 is too old to work with Frigate's ffmpeg, newer versions of the Grid driver (17.0 and above) deprecate P40. Might still be ok for CUDA only workload. I'm not sure if hardware accelerated decoding would work anyway with GPU-PV.
root@ubuntu-2404-gpu-1797075406:~# nvidia-smi
Thu Jan 30 12:28:22 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.230.02 Driver Version: 539.19 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla P40 On | 00000000:84:00.0 Off | 0 |
| N/A 33C P8 13W / 250W | 424MiB / 23040MiB | 2% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
One thing to try is switching out the libcuda_loader.so, as GPU-PV might not need to use the WSL stubbed version.
Check if you are using the stubbed version of libcuda.so.1.1/libcuda_loader.so (156k) vs non stubbed version of libcuda.so.1.1 (19M).
This file is in the ~/wsl/drivers folder if you copied over from host to guest. https://forums.developer.nvidia.com/t/wsl2-libcuda-so-and-libcuda-so-1-should-be-symlink/236301
I swapped out the 156K version with 19MB and made symlinks. Workloads are working fine after the changes on my end.
Sounds good, I've compiled all those steps into a PowerShell and Bash scripts that take care of copying the files, invoking the dkms compilation script, installing Docker and Nvidia Container Toolkit.
I just need to reboot my hyper-v host so I can reinstall the gpu driver and I'll test it all out there.
The scripts are below:
https://github.com/mateuszdrab/hyperv-vm-provisioning/blob/master/Copy-HostGPUDriverToUbuntu.ps1 https://github.com/mateuszdrab/hyperv-vm-provisioning/blob/master/install-gpu.sh
Does this make the VM handle graphics? and no remove and add gpu trickery required?
No, unfortunately the issue with needing to reenable-disable graphics passthrough for moonlight on a Linux guest is still present.
Depending on what you need QEMU is fairly mature for modern Linux and does use Hyper-V extensions via WHPX on Windows hosts.
From a windows host with OpenGL or a Windows Hyper-V guest with GPU-PV passthrough, QEMU can utilize dx12 for OpenGL either with SDL or its own headless Dbus/Spice solutions.
@mateuszdrab Did you get your problem solved? Did it fail on both kernel 5.15 and 6.x?
Hey @seflerZ
Things seem to be working fine now that the missing .so libraries are copied over from C:\Program Files\WSL\lib
I've not tried @Nislaco's steps yet in post https://github.com/seflerZ/oneclick-gpu-pv/issues/8#issuecomment-2635633914 because I don't want to tinker around with the setup as it's working. I'm definitely going to have to revisit it at some point because I'm currently stuck with CUDA 12.2 driver and the newer datacenter drivers are a better solution, if only nvidia-smi wouldn't seg fault (which I wonder if it has something to do with the registry based tweak that I have to apply on the host to switch the datacenter driver to WDDM mode). The grid driver is in WDDM mode out of box.
I have things working on 6.x, but I think everything was also fine on 5.15 - I just stopped trying 5.x kernels as soon as I identified kernel version was not the cause.
I wonder if the segmentation fault was due to the below https://github.com/microsoft/WSL/issues/11277
Apparently fixed in version 565.90, problem is, when I tried the datacenter driver with below versions and the issues were that they wouldn't switch to WDDM mode, so can't even add them to the VM using Add-VMGpuPartitionAdapter because Get-VMHostPartitionableGpu reports nothing since the driver is in TCC mode.
| Driver | Version | CUDA | Status |
|---|---|---|---|
| 551.61_grid | Grid 17.0 | 12.4 | Not Compatible ** |
| 553.62_grid | Grid 17.5 | 12.4 | Not Compatible ** |
| 539.19_grid | Grid 16.9 | 12.2 | Too Old CUDA* Works on Ubuntu |
| 539.19-data-center | 12.2 | Too Old* | |
| 572.13-data-center | 12.8 | WDDM switch fails | |
| 566.03-data-center | 12.7 | WDDM switch fails | |
| 551.78-data-center | 12.2 | WDDM switch works Fails on Ubuntu with seg fault |