vGPU_Unlock-Expanded icon indicating copy to clipboard operation
vGPU_Unlock-Expanded copied to clipboard

XCP-ng and P2000

Open MrMEEE opened this issue 1 year ago • 14 comments

Hi guys

I'm trying to get my Quadro P2000 to work with XCP-ng 8.2..

I have gotten the drivers installed and detecting the card: Mon Dec 9 23:03:56 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.161.05 Driver Version: 535.161.05 CUDA Version: N/A | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Quadro P2000 Off | 00000000:01:00.0 Off | N/A | | 71% 42C P0 18W / 75W | 23MiB / 5120MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+

but no vgpu support: [23:01 jabba ~]# nvidia-smi vgpu -s No supported devices in vGPU mode

nvidia-vgpud fails to start with return code 0

Pastebin is available here: https://pastebin.com/PK1qPSCH

Any ideas?

MrMEEE avatar Dec 09 '24 22:12 MrMEEE

Hey, can you try LD_PRELOAD="/path/to/uvgpu.so" nvidia-smi vgpu -s ?

timemaster5 avatar Feb 03 '25 20:02 timemaster5

I eventually bought a Tesla M60 instead..

Now I just need to be able to override the profiles, which the uvgpu doesn't have support for..

MrMEEE avatar Feb 03 '25 23:02 MrMEEE

Yep, same here. I don't know how to override them. :( I couldn't get any info from Snowman and others around vgpu-unlock-rs, on how this works, there must be a way, but I couldn't find it so far.

timemaster5 avatar Feb 04 '25 12:02 timemaster5

Reading this:

https://github.com/DualCoder/vgpu_unlock

It seems to me that the issue is that we need to build a new nvidia module gain access to the override... But we don't have the kernel sources for the XenServer nvidia driver, which is different than the "normal" one...

MrMEEE avatar Feb 04 '25 14:02 MrMEEE

I don't see any "profile" related stuff in that code. So I assume it is somewhere in nvidia driver patches, which are somewhere else.

timemaster5 avatar Feb 04 '25 14:02 timemaster5

https://github.com/mbilker/vgpu_unlock-rs/blob/ba66a6c6eeb16eb8e2d2ec368d6605b974070d4b/src/lib.rs#L527

MrMEEE avatar Feb 05 '25 08:02 MrMEEE

https://github.com/mbilker/vgpu_unlock-rs/blob/ba66a6c6eeb16eb8e2d2ec368d6605b974070d4b/src/lib.rs#L576

MrMEEE avatar Feb 05 '25 08:02 MrMEEE

nice! I was looking into it and we should have everything. There is a patch for kernel 4.19 kernel which is in Xen. We can build vgpu-unlock-rs on Xen, and I am sure we can use Nvidia Linux drivers instead of Xen ones. I saw someone on Discord who could build the Nvidia kernel module and failed elsewhere. I was able to build personally the vgpu-unlock-rs, but it didn't do anything probably because of a missing patched kernel module. I was hoping that the profile override would work with an unpatched one. In parallel, I am trying to understand from what you have posted, how to maybe apply an override by just LD_PRELOADing the right library. Not much progress though.

timemaster5 avatar Feb 06 '25 13:02 timemaster5

I was actually me that got the Official drivers to compile on xen.. the issue is that the original drivers depend on iommu/vt-d being enabled in the kernel, which it is NOT in the XCP-ng/Xenserver kernel:

[15:00 jabba ~]# if compgen -G "/sys/kernel/iommu_groups/*/devices/*" > /dev/null; then     echo "AMD's IOMMU / Intel's VT-D is enabled in the BIOS/UEFI."; else     echo "AMD's IOMMU / Intel's VT-D is not enabled in the BIOS/UEFI"; fi
AMD's IOMMU / Intel's VT-D is not enabled in the BIOS/UEFI

So the driver loads correctly, but VGPUs won't work.. so Citrix are apparently doing something evil/differently.. and we don't have the source..

MrMEEE avatar Feb 06 '25 14:02 MrMEEE

aaaha :) great job BTW.. Got you now.

It must be something different, though. IOMMU is enabled in XCP-NG and has been used for PCI passthrough.

Also confirmed on my host:

[    8.027985] Using GPFN IOMMU mode, 1-to-1 offset is 0x3e00000000
[    8.038972] XEN-PV-IOMMU: Using software bounce buffering for IO on 32bit DMA devices (SWIOTLB)
[    8.656980] XEN-PV-IOMMU - completed setting up 1-1 mapping

I think it is that vfio kernel module whats missing. We need these: vfio, vfio_iommu_type1, vfio_pci, vfio_virqfd

For me, mdevctl didn't work, which uses some kernel nodes I didn't see on my machine.. As a last resort, I think we should be able to compile our kernel, or maybe only a module. The env should be available here: https://github.com/xcp-ng/xcp-ng-build-env

But yes, it would mean using the generic Nvidia driver, as the Xen Grid Nvidia one does something totally different than what has been already documented in vgpu-unlock community, IMHO.

timemaster5 avatar Feb 07 '25 21:02 timemaster5

Thanks..

Sorry, my bad.. It WAS vfio that was missing.. so apparently the XenServer guys are doing this another way..

But, yes.. a possibility would be to enable the vfio kernel modules...

Right now, I think I need to setup somekind of testbench, as my homeserver has to be running for the sake of the marriage :)...

MrMEEE avatar Feb 11 '25 09:02 MrMEEE

If you can set up a testbench, I will be happy to help. I currently have also my servers blocked with "production" stuff :)

And wish you the best of luck :)

timemaster5 avatar Feb 19 '25 10:02 timemaster5

Ok.. I now have a testbench.. it's not fast, but it has 16GB of ram, a i5-8400 and a Quadro P2000..

Where do you think we should start?.. Should we move this to Discord or Matrix?

MrMEEE avatar Feb 28 '25 00:02 MrMEEE

Good job! I'll DM you on Discord I guess..

timemaster5 avatar Mar 03 '25 19:03 timemaster5