vfio-manage.sh can‘t bind multi-aux dev in nvidia-vfio-manager
My GPU is NVIDIA Corporation TU104GL [Quadro RTX 4000], the GPU have 3 aux dev
When I set up the GPU for the use of kubevirt vm pass through, the script vfio-mageme.sh cannot bind all aux dev to the vfio-pci driver
Bug In https://github.com/NVIDIA/gpu-operator/blob/main/assets/state-vfio-manager/0400_configmap.yaml#L128
The function get_grapcs_aux_dev should not use if ls "/sys/bus/pci/devices/$aux_dev/" as a criterion for judgment, and should return a string array. In the functions bind_device and unbind_device, loop through this array and perform judgment and corresponding operations
lspci -Dnnkv -d 10de:
0000:52:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104GL [Quadro RTX 4000] [10de:1eb1] (rev a1) (prog-if 00 [VGA controller]) Subsystem: NVIDIA Corporation Device [10de:12a0] Physical Slot: 8191-8 Flags: fast devsel, IRQ 16, NUMA node 0, IOMMU group 35 Memory at b3000000 (32-bit, non-prefetchable) [size=16M] Memory at 20ffe0000000 (64-bit, prefetchable) [size=256M] Memory at 20fff0000000 (64-bit, prefetchable) [size=32M] I/O ports at 7000 [size=128] Expansion ROM at b4000000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Legacy Endpoint, MSI 00 Capabilities: [100] Virtual Channel Capabilities: [250] Latency Tolerance Reporting Capabilities: [258] L1 PM Substates Capabilities: [128] Power Budgeting > Capabilities: [420] Advanced Error Reporting Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 > Capabilities: [900] Secondary PCI Express Capabilities: [bb0] Physical Resizable BAR Kernel driver in use: vfio-pci Kernel modules: nouveau
0000:52:00.1 Audio device [0403]: NVIDIA Corporation TU104 HD Audio Controller [10de:10f8] (rev a1) Subsystem: NVIDIA Corporation Device [10de:12a0] Physical Slot: 8191-8 Flags: bus master, fast devsel, latency 0, IRQ 17, NUMA node 0, IOMMU group 35 Memory at b4080000 (32-bit, non-prefetchable) [size=16K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Kernel driver in use: snd_hda_intel Kernel modules: snd_hda_intel
0000:52:00.2 USB controller [0c03]: NVIDIA Corporation TU104 USB 3.1 Host Controller [10de:1ad8] (rev a1) (prog-if 30 [XHCI]) Subsystem: NVIDIA Corporation Device [10de:12a0] Physical Slot: 8191-8 Flags: fast devsel, IRQ 64, NUMA node 0, IOMMU group 35 Memory at 20fff2000000 (64-bit, prefetchable) [size=256K] Memory at 20fff2040000 (64-bit, prefetchable) [size=64K] Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Capabilities: [b4] Power Management version 3 Capabilities: [100] Advanced Error Reporting Kernel driver in use: xhci_hcd
0000:52:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller [10de:1ad9] (rev a1) Subsystem: NVIDIA Corporation Device [10de:12a0] Physical Slot: 8191-8 Flags: fast devsel, IRQ 255, NUMA node 0, IOMMU group 35 Memory at b4084000 (32-bit, non-prefetchable) [disabled] [size=4K] Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Capabilities: [b4] Power Management version 3 Capabilities: [100] Advanced Error Reporting
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed. To skip these checks, apply the "lifecycle/frozen" label.