XCP-ng and P2000
Hi guys
I'm trying to get my Quadro P2000 to work with XCP-ng 8.2..
I have gotten the drivers installed and detecting the card: Mon Dec 9 23:03:56 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.161.05 Driver Version: 535.161.05 CUDA Version: N/A | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Quadro P2000 Off | 00000000:01:00.0 Off | N/A | | 71% 42C P0 18W / 75W | 23MiB / 5120MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+
but no vgpu support: [23:01 jabba ~]# nvidia-smi vgpu -s No supported devices in vGPU mode
nvidia-vgpud fails to start with return code 0
Pastebin is available here: https://pastebin.com/PK1qPSCH
Any ideas?
Hey, can you try LD_PRELOAD="/path/to/uvgpu.so" nvidia-smi vgpu -s ?
I eventually bought a Tesla M60 instead..
Now I just need to be able to override the profiles, which the uvgpu doesn't have support for..
Yep, same here. I don't know how to override them. :( I couldn't get any info from Snowman and others around vgpu-unlock-rs, on how this works, there must be a way, but I couldn't find it so far.
Reading this:
https://github.com/DualCoder/vgpu_unlock
It seems to me that the issue is that we need to build a new nvidia module gain access to the override... But we don't have the kernel sources for the XenServer nvidia driver, which is different than the "normal" one...
I don't see any "profile" related stuff in that code. So I assume it is somewhere in nvidia driver patches, which are somewhere else.
https://github.com/mbilker/vgpu_unlock-rs/blob/ba66a6c6eeb16eb8e2d2ec368d6605b974070d4b/src/lib.rs#L527
https://github.com/mbilker/vgpu_unlock-rs/blob/ba66a6c6eeb16eb8e2d2ec368d6605b974070d4b/src/lib.rs#L576
nice! I was looking into it and we should have everything. There is a patch for kernel 4.19 kernel which is in Xen. We can build vgpu-unlock-rs on Xen, and I am sure we can use Nvidia Linux drivers instead of Xen ones. I saw someone on Discord who could build the Nvidia kernel module and failed elsewhere. I was able to build personally the vgpu-unlock-rs, but it didn't do anything probably because of a missing patched kernel module. I was hoping that the profile override would work with an unpatched one. In parallel, I am trying to understand from what you have posted, how to maybe apply an override by just LD_PRELOADing the right library. Not much progress though.
I was actually me that got the Official drivers to compile on xen.. the issue is that the original drivers depend on iommu/vt-d being enabled in the kernel, which it is NOT in the XCP-ng/Xenserver kernel:
[15:00 jabba ~]# if compgen -G "/sys/kernel/iommu_groups/*/devices/*" > /dev/null; then echo "AMD's IOMMU / Intel's VT-D is enabled in the BIOS/UEFI."; else echo "AMD's IOMMU / Intel's VT-D is not enabled in the BIOS/UEFI"; fi
AMD's IOMMU / Intel's VT-D is not enabled in the BIOS/UEFI
So the driver loads correctly, but VGPUs won't work.. so Citrix are apparently doing something evil/differently.. and we don't have the source..
aaaha :) great job BTW.. Got you now.
It must be something different, though. IOMMU is enabled in XCP-NG and has been used for PCI passthrough.
Also confirmed on my host:
[ 8.027985] Using GPFN IOMMU mode, 1-to-1 offset is 0x3e00000000
[ 8.038972] XEN-PV-IOMMU: Using software bounce buffering for IO on 32bit DMA devices (SWIOTLB)
[ 8.656980] XEN-PV-IOMMU - completed setting up 1-1 mapping
I think it is that vfio kernel module whats missing. We need these: vfio, vfio_iommu_type1, vfio_pci, vfio_virqfd
For me, mdevctl didn't work, which uses some kernel nodes I didn't see on my machine.. As a last resort, I think we should be able to compile our kernel, or maybe only a module. The env should be available here: https://github.com/xcp-ng/xcp-ng-build-env
But yes, it would mean using the generic Nvidia driver, as the Xen Grid Nvidia one does something totally different than what has been already documented in vgpu-unlock community, IMHO.
Thanks..
Sorry, my bad.. It WAS vfio that was missing.. so apparently the XenServer guys are doing this another way..
But, yes.. a possibility would be to enable the vfio kernel modules...
Right now, I think I need to setup somekind of testbench, as my homeserver has to be running for the sake of the marriage :)...
If you can set up a testbench, I will be happy to help. I currently have also my servers blocked with "production" stuff :)
And wish you the best of luck :)
Ok.. I now have a testbench.. it's not fast, but it has 16GB of ram, a i5-8400 and a Quadro P2000..
Where do you think we should start?.. Should we move this to Discord or Matrix?
Good job! I'll DM you on Discord I guess..