one icon indicating copy to clipboard operation
one copied to clipboard

nVidia GPUs not listed when configured for PCI-Passthrough

Open gavin-cudo opened this issue 2 years ago • 1 comments

Description nVidia GPUs are not listed under a PCI devices on a host configured with PCI-Passthrough.

To Reproduce Configure a host with nVidia GPUs for PCI-Passthrough as per the documentation at https://docs.opennebula.io/6.4/open_cluster_deployment/kvm_node/pci_passthrough.html

Set the filter under /var/lib/one/remotes/etc/im/kvm-probes.d/pci.conf on the frontend to be:

:filter:
  - '*:*'
:short_address: []
:device_name: []

Expected behavior All PCI devices are listed under a onehost show <host_id> including nVidia GPUs.

Actual behavior All PCI devices are listed except nVidia GPUs.

Details

  • Hypervisor: KVM
  • Version: 6.4.0 CE and 6.4.0 Enterprise

Additional context GPUs are listed fine on the host with:

lspci -nn -d 10de:*
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
02:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
02:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
81:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
81:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
82:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
82:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
83:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
83:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
84:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
84:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
c1:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
c1:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
c2:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
c2:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
c3:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
c3:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
c4:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:24b0] (rev a1)
c4:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)

vfio driver is confirmed working as seen below:

lspci -vs 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation Device 2231 (rev a1) (prog-if 00 [VGA controller])
	Subsystem: NVIDIA Corporation Device 147e
	Flags: fast devsel, IRQ 255, NUMA node 0
	Memory at f4000000 (32-bit, non-prefetchable) [size=16M]
	Memory at 38060000000 (64-bit, prefetchable) [size=256M]
	Memory at 38070000000 (64-bit, prefetchable) [size=32M]
	I/O ports at 3000 [size=128]
	Expansion ROM at f5000000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Legacy Endpoint, MSI 00
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100] Virtual Channel
	Capabilities: [250] Latency Tolerance Reporting
	Capabilities: [258] L1 PM Substates
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [420] Advanced Error Reporting
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] Secondary PCI Express
	Capabilities: [bb0] Resizable BAR <?>
	Capabilities: [c1c] Physical Layer 16.0 GT/s <?>
	Capabilities: [d00] Lane Margining at the Receiver <?>
	Capabilities: [e00] Data Link Feature <?>
	Kernel driver in use: vfio-pci
	Kernel modules: nvidiafb, nouveau
cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-5.4.0-125-generic root=UUID=c01382e3-cd50-4103-a06c-576e6bafe9ce ro iommu=pt amd_iommu=on amdgpu.runpm=0 kvm_amd.sev=1 modprobe.blacklist=nouveau nouveau.modeset=0 nouveau.runpm=0 nvidia-drm.modeset=1 pcie_aspm=off radeon.modeset=0 radeon.runpm=0 vfio-pci vfio_iommu_type1.allow_unsafe_interrupts=1

The above configuration was known to be working on version 6.2.0.

gavin-cudo avatar Sep 08 '22 15:09 gavin-cudo

After further investigation, commenting out lines 114 to 118 in https://github.com/OpenNebula/one/blob/master/src/im_mad/remotes/node-probes.d/pci.rb#L114 allows the GPUs to be listed (albeit without the names showing, only IDs).

The offending lines are:

# The main device cannot be used, skip it
if CONF[:nvidia_vendors].include?(dev[:vendor]) &&
`ls /sys/class/mdev_bus | grep #{dev[:short_address]}`.empty?
next
end

So it looks like a bug introduced in 6.4 when vGPU support was added with commit https://github.com/OpenNebula/one/commit/7f719598bdf727d25490d5dac2915b0396b51309

gavin-cudo avatar Sep 14 '22 08:09 gavin-cudo

@xorel can you review the changes on this branch as it fixes a further issue where VMs fail as the UUID is still set in the host xml meaning that vGPU is always enabled even on hosts that only support passthrough.

https://github.com/OpenNebula/one/pull/5982

JungleCatSW avatar Sep 28 '22 19:09 JungleCatSW

@JungleCatSW @gavin-cudo Do you have a workaround to actually get PCI passthrough to work on 6.4? We found ourselves in the same boat with an invisible GPU until we commented out the filtering in pci.rb.

We face a similar problem to this ON Forum Post, with the full error message (04:00.00 is our host PCI address):

Fri Sep 30 17:45:23 2022: DEPLOY: Directory '/sys/class/mdev_bus/0000:04:00.0' does not exist error: Failed to create domain from /var/lib/one/datastores/110/1814/deployment.0 error: device not found: mediated device 'f7cdd2bc-e0bc-51f5-bdf3-62261edc310c' not found Could not create domain from /var/lib/one/datastores/110/1814/deployment.0 ExitCode: 255

cirquit avatar Sep 30 '22 15:09 cirquit

@cirquit see https://github.com/OpenNebula/one/pull/5982

You just need to replace one line in each pci.rb and pci.conf

JungleCatSW avatar Sep 30 '22 18:09 JungleCatSW

@JungleCatSW Unfortunately, applying this change only fixes the invisibility of the PCI device, but not the passthrough error when booting a new VM with a passthrough GPU (not a vGPU). It looks to me that #5968 and the passthrough problem are related, as ON currently wants to get a mediated device (vGPU) instead of a PCI passthrough device.

Did I maybe miss a configuration from the official PCI passthrough documentation which enables vGPUs by default?

(formatted for clarity, taken from the GUI when spawning a new VM)

Driver Error
Tue Oct 4 10:08:29 2022: DEPLOY:
Directory '/sys/class/mdev_bus/0000:04:00.0' does not exist error:
Failed to create domain from /var/lib/one/datastores/110/1821/deployment.0 error: device not found:
mediated device 'f7cdd2bc-e0bc-51f5-bdf3-62261edc310c' not found
Could not create domain from /var/lib/one/datastores/110/1821/deployment.0
ExitCode: 255

cirquit avatar Oct 04 '22 08:10 cirquit

@cirquit We had the same issue, once you have added a host with the old pci.rb and pci.conf the UUID gets stored, so even when you correct them the pci data just gets merged with the old incorrect data.

try running: $ onehost show -x <hostid>

scroll up and look for the pci section to see if there is a UUID field:

      <PCI>
        ...
        <DEVICE><![CDATA[228b]]></DEVICE>
        <DEVICE_NAME><![CDATA[Device]]></DEVICE_NAME>
        <UUID><![CDATA[              Is this here ?????  ]]></UUID>
        <DOMAIN><![CDATA[0000]]></DOMAIN>
        ...
      </PCI>

The way ONE knows whether it is using passthrough or vGPU is whether the UUID field exists in the PCI section of the host. https://github.com/OpenNebula/one/blob/2024f62ea918c91f2c5cb8d9bf033ee4f75d34a8/src/vmm_mad/remotes/lib/kvm/opennebula_vm.rb#L218

If you enroll a new host it should work, but to clear an existing host you have to:

  • delete it
  • set the pci filter to no devices in pci.conf :filter: '0:0' # no devices
  • add it
  • delete it
  • set the pci filter to all devices or nvidia devices in pci.conf :filter: '*:*' # all devices
  • add it again

you can use $ onehost show -x <hostid> each time you add it to check the xml is correclty being removed and then refreshed

let me know if that works for you

JungleCatSW avatar Oct 04 '22 09:10 JungleCatSW

@JungleCatSW Thanks for the detailed explanation!

It worked out exactly as you said. One interesting detail is when the PCI device was added as a vGPU it did not follow the natural ordering of the device address (04:00.0) in the PCI tab or the onehost show <id> and was always at the end of the list. When added correctly, it follows the ordering.

For other people who find this issue and have problems with GPU PCI passthrough with KVMs, make sure that you have the correct name and group rights on your /dev/vfio/* directory, they have to match the user and group which is defined in /etc/libvirt/qemu.conf on the host, otherwise you will get a permission denied via the ON frontend when accessing the /dev/vfio directory. In my case, it was a chown oneadmin:oneadmin -R /dev/vfio.

Also, in my case, I needed to reduce the memory size of the VM by ~ 2GB compared to a no-PCI-passthrough VM, as I would get an OOM by qemu. In my case, the host and VM became unresponsive via SSH and only got back after a few hours when (I presume) the qemu process was terminated by the OS.

cirquit avatar Oct 06 '22 08:10 cirquit

The problem should be solved with this patch https://github.com/OpenNebula/one/commit/3f300f3bf9ccd90fd082cb8b1ad85c0037911304

The source of the issue comes from forcing the use of vGPU, avoiding the use of the physical GPU for PCI-Passthrough. As @gavin-cudo commented, one of the problems resided here:

# The main device cannot be used, skip it
if CONF[:nvidia_vendors].include?(dev[:vendor]) &&
   `ls /sys/class/mdev_bus | grep #{dev[:short_address]}`.empty?
      next
end

However, removing those lines makes that both GPUs and vGPUs can be used at the same time, which is not correct.

On the other hand, the configuration modification that @JungleCatSW commented avoids adding the UUID to the GPU device when it works as physical GPU for PCI-Passthrough, but it does not properly manage the use of vGPUs since, as he indicated, OpenNebula use this field in order to use the vGPU.

# The uuid is based on the address to get always the same
if CONF[:nvidia_vendors].include?(dev[:vendor])

With the patch I propose, GPUs and vGPUs should be listed correctly depending on whether GPU virtualization is enabled or not with NVIDIA drivers (as indicated in the official documentation). Similarly, it is also controlled that the UUID is added only to the vGPUs, leaving the GPUs configured as a usual PCI device.

vickmp avatar Oct 06 '22 09:10 vickmp

Hello, I had the same issue and wanted to add here that it worked for me to correct the VM error after applying the patch for pci.rb. As I could not afford to undeploy the host, I managed to remove the UUID attribute using the onedb update-body host --id 0 command.

thereiam avatar Oct 09 '22 23:10 thereiam