WSL icon indicating copy to clipboard operation
WSL copied to clipboard

Nvidia GPU passthrough fails with GPUs not consistently in WDDM mode

Open HMedbian opened this issue 6 months ago • 9 comments

Windows Version

Microsoft Windows [Version 10.0.26100.4061]

WSL Version

2.4.13.0

Are you using WSL 1 or WSL 2?

  • [x] WSL 2
  • [ ] WSL 1

Kernel Version

Linux version 5.15.167.4-microsoft-standard-WSL2

Distro Version

Ubuntu 22.04

Other Software

Nvidia GPU driver version is 572.61 and 576.52 tested

Repro Steps

take a sytem with at least two Nvidia CPUs, one in TCC mode and the other in WDDM mode run nvidia-smi inside WSL ->

Unable to determine the device handle for GPU0: 0000:C1:00.0: Unknown Error
Unable to determine the device handle for GPU1: 0000:E1:00.0: Unknown Error
No devices were found

(more detailed descriptions already posted in Nvidia Developer Forum: https://forums.developer.nvidia.com/t/wsl2-nvidia-smi-unable-to-determine-the-device-handle-for-gpu/334254 and https://forums.developer.nvidia.com/t/tesla-gpu-on-windows-server-2022-not-detected-in-wsl-and-docker-containers/333910 )

Expected Behavior

the GPUs in TCC mode should be ignored as TCC driver mode is currently not supported with WSL (https://github.com/microsoft/WSL/issues/9952) but the GPUs in WDDM mode should work normally

Actual Behavior

no GPUs can be used in WSL

Diagnostic Logs

No response

HMedbian avatar Jun 01 '25 22:06 HMedbian

Logs are required for review from WSL team

If this a feature request, please reply with '/feature'. If this is a question, reply with '/question'. Otherwise please attach logs by following the instructions below, your issue will not be reviewed unless they are added. These logs will help us understand what is going on in your machine.

How to collect WSL logs

Download and execute collect-wsl-logs.ps1 in an administrative powershell prompt:

Invoke-WebRequest -UseBasicParsing "https://raw.githubusercontent.com/microsoft/WSL/master/diagnostics/collect-wsl-logs.ps1" -OutFile collect-wsl-logs.ps1
Set-ExecutionPolicy Bypass -Scope Process -Force
.\collect-wsl-logs.ps1

The script will output the path of the log file once done.

If this is a networking issue, please use collect-networking-logs.ps1, following the instructions here

Once completed please upload the output files to this Github issue.

Click here for more info on logging If you choose to email these logs instead of attaching to the bug, please send them to [email protected] with the number of the github issue in the subject, and in the message a link to your comment in the github issue and reply with '/emailed-logs'.

github-actions[bot] avatar Jun 01 '25 22:06 github-actions[bot]

The log file doesn't contain any WSL traces. Please make sure that you reproduced the issue while the log collection was running.

Diagnostic information
Detected appx version: 2.4.13.0
Found no WSL traces in the logs

github-actions[bot] avatar Jun 01 '25 23:06 github-actions[bot]

The log collection was running. The issue is difficult to capture using the provided log collection script because it’s about something not happening — specifically, the GPU isn’t available in WSL when it should be. I included a command in the issue description that shows an error on the command line, but that output doesn’t show up in the collected logs as the Nvidia tool terminates properly...

HMedbian avatar Jun 01 '25 23:06 HMedbian

@HMedbian: Can you capture a full WSL boot and a repro of the error ? You can do that by running wsl --shutdown before running the script.

/logs

OneBlue avatar Jun 02 '25 22:06 OneBlue

@OneBlue : Thank you for taking a look. I have captured the logs again with shutting down before: WslLogs-2025-06-03_01-04-10.zip

HMedbian avatar Jun 02 '25 23:06 HMedbian

Diagnostic information
Detected appx version: 2.4.13.0

github-actions[bot] avatar Jun 02 '25 23:06 github-actions[bot]

FYSA - on my system, "nvidia-smi -i 0" works on wsl

0 is my WDDM card - 4090 1 is my TCC card - P40

"nvidia-smi -i 1" returns: Unable to determine the device handle for GPU0000:05:00.0: Unknown Error

kernel is 5.15.167.4-microsoft-standard-WSL2

Seems like both of the failure examples linked have the TCC card as card 0.

chlimouj avatar Jun 05 '25 22:06 chlimouj

@chlimouj Thank you for your comment — that's indeed very interesting. I believe the NVIDIA driver orders GPUs based on their PCI bus IDs, so simply swapping the GPUs’ PCI slots might serve as a workaround for some users. Unfortunately, I can't try this myself due to space constraints in my PC case, but I’ll forward this information to the NVIDIA Developer Forum. Perhaps someone there can try it out.

HMedbian avatar Jun 05 '25 22:06 HMedbian