kata-containers icon indicating copy to clipboard operation
kata-containers copied to clipboard

Add kernel config for NVIDIA DPU/ConnectX adapter

Open l8huang opened this issue 1 year ago • 11 comments

With Nvidia DPU or ConnectX network adapter, VF can do VFIO passthrough to guest VM in guest-kernel mode. In the guest kernel, the adapter's driver is required to claim the VFIO device and create network interface.

l8huang avatar May 10 '24 23:05 l8huang

FYI: in the guest VM:

root@localhost:/proc# lspci  -tv
-[0000:00]-+-00.0  Device 8086:29c0
           +-01.0  Device 1af4:1003
           +-02.0-[01]--
           +-03.0  Device 1af4:1004
           +-04.0  Device 1af4:1005
           +-05.0-[02]----00.0  Device 15b3:101e
           +-06.0-[03]--
           +-07.0  Device 1af4:1053
           +-08.0  Device 1af4:1009
           +-1f.0  Device 8086:2918
           +-1f.2  Device 8086:2922
           \-1f.3  Device 8086:2930

root@localhost:/proc# lspci -nn -k -s 0000:02:00.0
02:00.0 Class [0200]: Device [15b3:101e] (rev 01)
	Subsystem: Device [15b3:0063]
	Kernel driver in use: mlx5_core

root@localhost:/proc# ethtool -i eth0 
driver: mlx5_core
version: 6.1.62-nvidia-gpu
firmware-version: 24.35.3502 (MT_0000000542)
expansion-rom-version: 
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

l8huang avatar May 10 '24 23:05 l8huang

@zvonkok PR updated, PTAL, thanks

l8huang avatar May 17 '24 18:05 l8huang

@l8huang You need to bump tools/packaging/kernel/kata_config_version

zvonkok avatar May 22 '24 08:05 zvonkok

@l8huang You need to bump tools/packaging/kernel/kata_config_version

Thanks for the heads up, version updated.

l8huang avatar May 22 '24 21:05 l8huang

@lifupan @fidencio @GabyCT need another LGTM, could you please take a look? thanks

l8huang avatar Jun 04 '24 03:06 l8huang

-D : DPU/SmartNIC vendor, only NVIDIA. => -D : DPU/SmartNIC vendor, only Mellanox.

@amshinde Mellanox was acquired by NVIDIA, the products are named under NVIDIA now.

l8huang avatar Jun 06 '24 06:06 l8huang

-D : DPU/SmartNIC vendor, only NVIDIA. => -D : DPU/SmartNIC vendor, only Mellanox.

@amshinde Mellanox was acquired by NVIDIA, the products are named under NVIDIA now.

@l8huang I understand that, I suggested the rename to Mellanox to avoid confusion with Nvidia GPU, and since the kernel configs refer to Mellanox rather than Nvidia.

amshinde avatar Jun 06 '24 22:06 amshinde

According to https://en.wikipedia.org/wiki/Mellanox_Technologies:

The company was integrated into Nvidia's networking division in 2020 and Nvidia stopped using the brand name "Mellanox" for its new networking products.

If one googles Mellanox Technologies, the top results point to NVIDIA.

TBH: I don't see too much confusion, the option says the DPU/SmartNIC vendor. We should move beyond historical legacies, looking forward NVIDIA is the de facto vendor.

@zvonkok what do you think?

l8huang avatar Jun 07 '24 15:06 l8huang

@amshinde Would you mind merging this PR as it is? If any confusion arises later, I will address and amend it accordingly.

l8huang avatar Jun 13 '24 00:06 l8huang

@l8huang There is a merge conflict now, can you rebase this PR?

amshinde avatar Jun 25 '24 20:06 amshinde

@amshinde thanks for heads up, just rebased.

l8huang avatar Jun 25 '24 23:06 l8huang

@amshinde could you please take a look again?

l8huang avatar Jul 01 '24 23:07 l8huang

@l8huang artifacts from your last push have expired (they are cached for 2 weeks IIRC). Please rebase.

gkurz avatar Jul 18 '24 09:07 gkurz

@gkurz thanks for heads up, rebased.

l8huang avatar Jul 18 '24 18:07 l8huang