oneclick-gpu-pv GPU access blocked by the operating system

Thanks for this script, I've been trying to get this to work since yesterday but things are not playing ball so far.

I've used both your version of the script and https://gist.github.com/Nislaco/ce7ec314bdf0cf519ff0fb2fffc55107 from https://github.com/seflerZ/oneclick-gpu-pv/issues/7

The dkms module compiles fine, dmesg | grep dx shows

[    2.404359] dxgkrnl: loading out-of-tree module taints kernel.
[    2.404372] dxgkrnl: module verification failed: signature and/or required key missing - tainting kernel
[    2.410266] hv_vmbus: registering driver dxgkrnl

and nvidia-smi shows

Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

The environment is running Server 2025 as the host with Nvidia Tesla P40 running the 539.19 grid version of the driver as the other drivers do not want to switch into WDDM mode.

VM is running Ubuntu 24.04 with kernel 6.8.0-52-generic, secure boot is disabled using Nislaco's script. Second VM is running Ubuntu 22.04 with kernel 5.15.0-130-generic, secure boot is disabled and the dkms module compiled using your archive. Set the below options on the VMs as well: -CheckpointType Disabled -LowMemoryMappedIoSpace 3GB -HighMemoryMappedIoSpace 32GB -GuestControlledCacheTypes $true -AutomaticStopAction ShutDown

Host's output of nvidia-smi is

PS C:\Tools> nvidia-smi.exe
Tue Jan 28 13:15:31 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 539.19                 Driver Version: 539.19       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla P40                    WDDM  | 00000000:84:00.0 Off |                    0 |
| N/A   38C    P0              47W / 250W |    391MiB / 23040MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      4276    C+G   C:\Windows\System32\LogonUI.exe           N/A      |
|    0   N/A  N/A      4284    C+G   C:\Windows\System32\dwm.exe               N/A      |
|    0   N/A  N/A     14380    C+G   C:\Windows\explorer.exe                   N/A      |
|    0   N/A  N/A     14880    C+G   C:\Windows\System32\dwm.exe               N/A      |
|    0   N/A  N/A     20796    C+G   ...__8wekyb3d8bbwe\WindowsTerminal.exe    N/A      |
|    0   N/A  N/A     30472    C+G   C:\Windows\System32\WUDFHost.exe          N/A      |
+---------------------------------------------------------------------------------------+

I compared this with a working WSL2 instance from another machine and dmesg | grep dx shows slightly different output

 > dmesg | grep dx
[    0.343507] hv_vmbus: registering driver dxgkrnl
[    1.342853] misc dxg: dxgk: dxgkio_is_feature_enabled: Ioctl failed: -22
[    1.354558] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[    1.354973] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[    1.355402] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[    1.355907] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    2.339741] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    2.712171] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    2.712679] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    2.713168] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    2.713681] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    2.714260] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    2.714686] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    2.715257] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    2.715789] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    2.716249] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    2.716670] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    2.717074] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[    2.820033] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2

Do you have any thoughts on what I could be missing? I'm also suspecting this driver might be too old but this is the latest one from the Grid series that supports P40 to the best of my knowledge.

Paravirtualization works fine with my Windows 11 guest VM.

Jan 28 '25 17:01 mateuszdrab

You would still need to move over the driver files from the host to guest. This should accommodate this: Run as a script from powershell as admin:

$username="username"
$ip="10.0.0.XX"

# Create a destination folder.
ssh ${username}@${ip} "mkdir -p ~/wsl/drivers; mkdir -p ~/wsl/lib;"

# Copy driver files
# https://github.com/brokeDude2901/dxgkrnl_ubuntu/blob/main/README.md#3-copy-windows-host-gpu-driver-to-ubuntu-vm

(Get-CimInstance -ClassName Win32_VideoController -Property *).InstalledDisplayDrivers | Select-String "C:\\Windows\\System32\\DriverStore\\FileRepository\\[a-zA-Z0-9\\._]+\\" | foreach {
    $l=$_.Matches.Value.Substring(0, $_.Matches.Value.Length-1)
    scp -r $l ${username}@${ip}:~/wsl/drivers/
}

scp -r C:\Windows\System32\lxss\lib ${username}@${ip}:~/wsl/

This part can be run in guest directly:

sudo mv ~/wsl /usr/lib/wsl;
sudo chmod -R 555 /usr/lib/wsl;
sudo chown -R root:root /usr/lib/wsl;

sudo sh -c 'echo "/usr/lib/wsl/lib" > /etc/ld.so.conf.d/ld.wsl.conf' ;
sudo ldconfig ;
sudo ln -s /usr/lib/wsl/lib/libd3d12core.so /usr/lib/wsl/lib/libD3D12Core.so ;

or as sudo:
mv ~/wsl /usr/lib/wsl
chmod -R 555 /usr/lib/wsl
chown -R root:root /usr/lib/wsl
sh -c 'echo "/usr/lib/wsl/lib" > /etc/ld.so.conf.d/ld.wsl.conf'
ldconfig
ln -s /usr/lib/wsl/lib/libd3d12core.so /usr/lib/wsl/lib/libD3D12Core.so

If you have multiple Nvidia GPU's you need to pass all through to a guest for Nvidia-smi to work correctly.

Apologies on my end as my gists don't mention this and just cover the dkms module part of the setup.

You would need to uninstall dkms module and remove/readd files between driver version changes.

Jan 28 '25 20:01 Nislaco

Hey, thanks for getting back to me.

I forgot to add this in my initial issue, but of course, I copied the files over.

It did stumble me at first that libd3d12core.so is not there.

I read somewhere that WSL2 needs to be enabled on the host, perhaps that's why some of the files are missing?

This still does not make sense as on the local machine where WSL2 works well with GPU-PV, the C:\Windows\System32\lxss\lib directory does not have libd3d12core.so either, even though inside WSL, it is present.

Jan 28 '25 20:01 mateuszdrab

You might want to check out https://github.com/staralt/dxgkrnl-dkms ; That is what my scripts were based on as well as info here.

With either, I used files from my windows hosts and copied them over to the Linux guest. I did not use files from WSL, but I do have WSL enabled.

I am only using Cuda from cli and have not setup a desktop environment.

Jan 28 '25 20:01 Nislaco

Yeah, it's not working even with the custom kernel from https://github.com/brokeDude2901/dxgkrnl_ubuntu/blob/main/README.md#4-install-custom-dxgkrnl-kernel

I think it's a driver issue - I will try another driver on the host but I'll have to try figure out how to switch it to WDDM mode because I get error code 43 on the datacenter/studio driver when I follow those steps https://linustechtips.com/topic/1496913-can-i-enable-wddm-on-a-tesla-p40/.

Jan 28 '25 20:01 mateuszdrab

Reproduced in a brand new Ubuntu 24.04 VM using steps from https://github.com/staralt/dxgkrnl-dkms on a different machine with working GPU-PV in WSL2 and it's not working in the VM either :(

Jan 29 '25 16:01 mateuszdrab

I can confirm the same issues you are having on my end with various Nvidia cards on separate hosts using my scripts.

However, using seflerZ's scripts for an Ubuntu 22.04.1 Server running kernel 5.15.0-131-generic is working correctly.

Jan 29 '25 19:01 Nislaco

https://gist.github.com/Nislaco/ce7ec314bdf0cf519ff0fb2fffc55107

ssh user@ip "sudo -S mkdir -p $(echo /usr/lib/wsl/drivers/)"
scp -r /usr/lib/wsl/lib user@ip\:~
scp -r /usr/lib/wsl/drivers user@ip\:~
ssh user@ip "sudo -S mv ~/lib/* /usr/lib;sudo -S ln -s /lib/libd3d12core.so /lib/libD3D12Core.so;sudo -S mv ~/drivers/* /usr/lib/wsl/drivers"

Making the module and moving files over with the same method as SeflerZ's ubuntu script from an ubuntu WSL instance worked correctly.

Jan 29 '25 22:01 Nislaco

Got it working now as well on my workstation, both using your script and staralts dkms scripts. It even works on 22.04 and 24.04. I tried each script on each Ubuntu release, for testing's sake.

The difference is that I sourced the files from WSL2 this time, not from the host OS like most of the instructions are saying. This gives me the extra missing files .so which are not present under C:\Windows\System32\lxss\lib, I suspect those are the files that make it work.

Files in WSL2

libcuda.so
libcuda.so.1
libcuda.so.1.1
libcudadebugger.so.1
libd3d12.so
libd3d12core.so
libdxcore.so
libnvcuvid.so
libnvcuvid.so.1
libnvdxdlkernels.so
libnvidia-encode.so
libnvidia-encode.so.1
libnvidia-ml.so.1
libnvidia-opticalflow.so
libnvidia-opticalflow.so.1
libnvoptix.so.1
libnvoptix_loader.so.1 -> libnvoptix.so.1
libnvwgf2umx.so
nvidia-smi

Files in lxss dir

libcuda.so                
libcuda.so.1              
libcuda.so.1.1            
libcudadebugger.so.1      
libnvcuvid.so             
libnvcuvid.so.1           
libnvdxdlkernels.so       
libnvidia-encode.so       
libnvidia-encode.so.1     
libnvidia-ml.so.1         
libnvidia-opticalflow.so  
libnvidia-opticalflow.so.1
libnvoptix.so.1           
libnvwgf2umx.so           
nvidia-smi

I ran tar -czf drivers.tar /usr/lib/wsl/lib /usr/lib/wsl/drivers/nv_dispi.inf_amd64_adf5a840df867035/ on WSL2 and transferred the file to the VM. Extracted using tar -xvf drivers.tar -C /

From here there seem to be two ways to go... leave /usr/lib/wsl/lib in place and do

sh -c 'echo "/usr/lib/wsl/lib" > /etc/ld.so.conf.d/ld.wsl.conf'
ldconfig
sed -i '/^PATH=/ {/usr\/lib\/wsl\/lib/! s|"$|:/usr/lib/wsl/lib"|}' /etc/environment
source /etc/environment

or

mv /usr/lib/wsl/lib/* /usr/lib/

Not sure if ln -s /lib/libd3d12core.so /lib/libD3D12Core.so is needed or not. As of right now, things work without it.

Even works with Docker on Ubuntu 22.04

root@ubuntu-gpu-2204-906834886:~# sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
de44b265507a: Pull complete
Digest: sha256:80dd3c3b9c6cecb9f1667e9290b3bc61b78c2678c02cbdae5f0fea92cc6734ab
Status: Downloaded newer image for ubuntu:latest
Thu Jan 30 00:20:56 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.72                 Driver Version: 566.14         CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        On  |   00000000:01:00.0  On |                  N/A |
|  0%   46C    P8             43W /  390W |    1841MiB /  24576MiB |     13%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
root@ubuntu-gpu-2204-906834886:~#

Thanks for your help.

I guess the instructions for driver installation need to be updated to save others time.

Could you try reproducing that sourcing the driver files from lxss is the culprit on your side too?

Jan 30 '25 00:01 mateuszdrab

Thanks for the update and patience! You are correct that it does come down to missing files and path issues. This did work previously with just files from LXSS but they might have relocated them at some point.

C:\Program Files\WSL\lib should have these files. Which is provided by wsl.exe --update

Jan 30 '25 00:01 Nislaco

Sounds good, I've compiled all those steps into a PowerShell and Bash scripts that take care of copying the files, invoking the dkms compilation script, installing Docker and Nvidia Container Toolkit.

I just need to reboot my hyper-v host so I can reinstall the gpu driver and I'll test it all out there.

The scripts are below:

https://github.com/mateuszdrab/hyperv-vm-provisioning/blob/master/Copy-HostGPUDriverToUbuntu.ps1 https://github.com/mateuszdrab/hyperv-vm-provisioning/blob/master/install-gpu.sh

Jan 30 '25 03:01 mateuszdrab

Glad to hear it is working again and thank you for bringing this up.

This did previously work without any issues other than an occasional Ioctl error in console and dmesg when running some workloads:

I can confirm this is working across 2 different hosts with 3 different NVidia cards, after adjusting paths to include LXSS and WSL\lib.

Jan 30 '25 04:01 Nislaco

Oh no, on the Hypervisor I get:

root@ubuntu-2404-gpu-2125067429:~# nvidia-smi
Thu Jan 30 12:00:00 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 551.78         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
Segmentation fault (core dumped)

First, the driver on the host is one of the ones that needs manual switching from TCC mode to WDDM mode via registry. Will try the older Grid driver which works in WDDM straight away. Second, WSL isn't installed. I'll try installing it (I used the missing files from my other machine).

The above does work in a Windows VM though

Jan 30 '25 12:01 mateuszdrab

Just as I thought, the Grid driver version 16.9 works fine but driver version 539.19 is too old to work with Frigate's ffmpeg, newer versions of the Grid driver (17.0 and above) deprecate P40. Might still be ok for CUDA only workload. I'm not sure if hardware accelerated decoding would work anyway with GPU-PV.

root@ubuntu-2404-gpu-1797075406:~# nvidia-smi
Thu Jan 30 12:28:22 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.230.02             Driver Version: 539.19       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla P40                      On  | 00000000:84:00.0 Off |                    0 |
| N/A   33C    P8              13W / 250W |    424MiB / 23040MiB |      2%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Jan 30 '25 12:01 mateuszdrab

One thing to try is switching out the libcuda_loader.so, as GPU-PV might not need to use the WSL stubbed version.

Check if you are using the stubbed version of libcuda.so.1.1/libcuda_loader.so (156k) vs non stubbed version of libcuda.so.1.1 (19M).

This file is in the ~/wsl/drivers folder if you copied over from host to guest. https://forums.developer.nvidia.com/t/wsl2-libcuda-so-and-libcuda-so-1-should-be-symlink/236301

I swapped out the 156K version with 19MB and made symlinks. Workloads are working fine after the changes on my end.

Feb 05 '25 03:02 Nislaco

Sounds good, I've compiled all those steps into a PowerShell and Bash scripts that take care of copying the files, invoking the dkms compilation script, installing Docker and Nvidia Container Toolkit.

I just need to reboot my hyper-v host so I can reinstall the gpu driver and I'll test it all out there.

The scripts are below:

https://github.com/mateuszdrab/hyperv-vm-provisioning/blob/master/Copy-HostGPUDriverToUbuntu.ps1 https://github.com/mateuszdrab/hyperv-vm-provisioning/blob/master/install-gpu.sh

Does this make the VM handle graphics? and no remove and add gpu trickery required?

Feb 06 '25 23:02 MemQu

No, unfortunately the issue with needing to reenable-disable graphics passthrough for moonlight on a Linux guest is still present.

Depending on what you need QEMU is fairly mature for modern Linux and does use Hyper-V extensions via WHPX on Windows hosts.

From a windows host with OpenGL or a Windows Hyper-V guest with GPU-PV passthrough, QEMU can utilize dx12 for OpenGL either with SDL or its own headless Dbus/Spice solutions.

Feb 08 '25 01:02 Nislaco

@mateuszdrab Did you get your problem solved? Did it fail on both kernel 5.15 and 6.x?

Feb 13 '25 15:02 seflerZ

Hey @seflerZ

Things seem to be working fine now that the missing .so libraries are copied over from C:\Program Files\WSL\lib

I've not tried @Nislaco's steps yet in post https://github.com/seflerZ/oneclick-gpu-pv/issues/8#issuecomment-2635633914 because I don't want to tinker around with the setup as it's working. I'm definitely going to have to revisit it at some point because I'm currently stuck with CUDA 12.2 driver and the newer datacenter drivers are a better solution, if only nvidia-smi wouldn't seg fault (which I wonder if it has something to do with the registry based tweak that I have to apply on the host to switch the datacenter driver to WDDM mode). The grid driver is in WDDM mode out of box.

I have things working on 6.x, but I think everything was also fine on 5.15 - I just stopped trying 5.x kernels as soon as I identified kernel version was not the cause.

Feb 13 '25 16:02 mateuszdrab

I wonder if the segmentation fault was due to the below https://github.com/microsoft/WSL/issues/11277

Apparently fixed in version 565.90, problem is, when I tried the datacenter driver with below versions and the issues were that they wouldn't switch to WDDM mode, so can't even add them to the VM using Add-VMGpuPartitionAdapter because Get-VMHostPartitionableGpu reports nothing since the driver is in TCC mode.

Driver	Version	CUDA	Status
551.61_grid	Grid 17.0	12.4	Not Compatible **
553.62_grid	Grid 17.5	12.4	Not Compatible **
539.19_grid	Grid 16.9	12.2	Too Old CUDA* Works on Ubuntu
539.19-data-center		12.2	Too Old*
572.13-data-center		12.8	WDDM switch fails
566.03-data-center		12.7	WDDM switch fails
551.78-data-center		12.2	WDDM switch works Fails on Ubuntu with seg fault

Feb 13 '25 16:02 mateuszdrab