sriov-network-device-plugin icon indicating copy to clipboard operation
sriov-network-device-plugin copied to clipboard

Add auxiliary network devices support

Open DmytroLinkin opened this issue 2 years ago • 1 comments

Auxiliary devices pool is created when auxNetDevice type is specified in plugin config file. Auxiliary devices discovered for each discovered network PCI device, meaning only auxiliary network devices with parent PCI device are supported. New AuxTypes selector allows to filter auxiliary device by the type. For ex., given following auxiliary devices list..

foo.bar.0
fancy.bar.0
foo.bar.1
foo.baz.0

.. and selector value ["bar"], first three devices will be added to the pool.

Auxiliary network devices accept selectors: vendors, devices, drivers, pfNames, rootDevices, linkTypes, isRdma.

NOTE patchset built on top of #441

DmytroLinkin avatar Aug 16 '22 07:08 DmytroLinkin

Pull Request Test Coverage Report for Build 2910194077

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 214 of 288 (74.31%) changed or added relevant lines in 15 files are covered.
  • 106 unchanged lines in 9 files lost coverage.
  • Overall coverage increased (+2.7%) to 79.949%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/infoprovider/genericInfoProvider.go 2 3 66.67%
pkg/netdevice/netDeviceProvider.go 2 3 66.67%
pkg/utils/rdma_provider.go 0 3 0.0%
pkg/utils/testing.go 1 4 25.0%
pkg/devices/rdma.go 5 9 55.56%
pkg/factory/factory.go 24 28 85.71%
pkg/auxnetdevice/auxNetDevice.go 32 38 84.21%
pkg/utils/netlink_provider.go 0 6 0.0%
pkg/utils/sriovnet_provider.go 0 12 0.0%
pkg/utils/utils.go 15 29 51.72%
<!-- Total: 214 288
Files with Coverage Reduction New Missed Lines %
pkg/utils/sriovnet_provider.go 1 28.57%
pkg/accelerator/accelDevice.go 2 84.62%
pkg/factory/factory.go 8 89.93%
pkg/netdevice/netResourcePool.go 8 87.14%
pkg/accelerator/accelDeviceProvider.go 9 87.5%
pkg/netdevice/netDeviceProvider.go 14 87.18%
pkg/resources/deviceSelectors.go 15 88.13%
pkg/netdevice/pciNetDevice.go 16 69.84%
pkg/utils/utils.go 33 82.91%
<!-- Total: 106
Totals Coverage Status
Change from base Build 2733786348: 2.7%
Covered Lines: 1870
Relevant Lines: 2339

💛 - Coveralls

coveralls avatar Aug 16 '22 07:08 coveralls

@DmytroLinkin

i tested out your PR and added some comments.

i was able to successfully spin up a pod with SF + RDMA resources

one issue i found is that the injected env variable (PCIDEVICE_NVIDIA_COM_MLNX_SF) is showing the parent PF instead of the allocated SF.

/ # env

KUBERNETES_SERVICE_PORT=443

KUBERNETES_PORT=tcp://10.96.0.1:443

HOSTNAME=testpod1

SHLVL=1

HOME=/root

PCIDEVICE_NVIDIA_COM_MLNX_SF=0000:03:00.0

TERM=xterm

KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

KUBERNETES_PORT_443_TCP_PORT=443

KUBERNETES_PORT_443_TCP_PROTO=tcp

KUBERNETES_SERVICE_PORT_HTTPS=443

KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443

KUBERNETES_SERVICE_HOST=10.96.0.1

PWD=/

adrianchiris avatar Jan 03 '23 17:01 adrianchiris

@DmytroLinkin

i tested out your PR and added some comments.

i was able to successfully spin up a pod with SF + RDMA resources

one issue i found is that the injected env variable (PCIDEVICE_NVIDIA_COM_MLNX_SF) is showing the parent PF instead of the allocated SF.

/ # env

KUBERNETES_SERVICE_PORT=443

KUBERNETES_PORT=tcp://10.96.0.1:443

HOSTNAME=testpod1

SHLVL=1

HOME=/root

PCIDEVICE_NVIDIA_COM_MLNX_SF=0000:03:00.0

TERM=xterm

KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

KUBERNETES_PORT_443_TCP_PORT=443

KUBERNETES_PORT_443_TCP_PROTO=tcp

KUBERNETES_SERVICE_PORT_HTTPS=443

KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443

KUBERNETES_SERVICE_HOST=10.96.0.1

PWD=/

It is genericInfoProvider which returns pciAddress of the device. I changed it to return pciAddr/auxiliary

DmytroLinkin avatar Jan 04 '23 13:01 DmytroLinkin

/retest

adrianchiris avatar Jan 15 '23 16:01 adrianchiris

minor nit on doc update.

otherwise LGTM. did some basic testing for the latest iteration on my dev setup for both SF and SR-IOV.

I have re-triggered Mellanox CI.

adrianchiris avatar Jan 15 '23 17:01 adrianchiris

@SchSeba @Eoghan1232 PATL

adrianchiris avatar Jan 15 '23 17:01 adrianchiris

@SchSeba @Eoghan1232 PTAL, id like to try and get this merged soon.

i believe the PR Is in a good state and LGTM'd it.

adrianchiris avatar Jan 25 '23 08:01 adrianchiris

Thx for working on this one @DmytroLinkin !

adrianchiris avatar Jan 26 '23 13:01 adrianchiris