sriov-network-device-plugin icon indicating copy to clipboard operation
sriov-network-device-plugin copied to clipboard

Add selectors `nicNames`

Open ian-howell opened this issue 2 years ago • 7 comments

This addresses #433 by adding the nicNames fields as selectors

(8/8/2022 Update): This no longer addresses macAddresses

ian-howell avatar Jun 27 '22 22:06 ian-howell

Pull Request Test Coverage Report for Build 4316249828

  • 54 of 118 (45.76%) changed or added relevant lines in 5 files are covered.
  • 2 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-1.8%) to 76.508%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/factory/factory.go 0 2 0.0%
pkg/netdevice/netDeviceProvider.go 35 37 94.59%
pkg/devices/gen_net.go 8 11 72.73%
pkg/utils/utils.go 0 57 0.0%
<!-- Total: 54 118
Files with Coverage Reduction New Missed Lines %
pkg/devices/gen_net.go 1 83.54%
pkg/netdevice/netDeviceProvider.go 1 85.0%
<!-- Total: 2
Totals Coverage Status
Change from base Build 4294051176: -1.8%
Covered Lines: 1941
Relevant Lines: 2537

💛 - Coveralls

coveralls avatar Jul 04 '22 10:07 coveralls

@ian-howell does PfName selector satisfy your use-case based on community discussion we had on jul 18th ?

if so we can close this PR and related issue

adrianchiris avatar Jul 25 '22 11:07 adrianchiris

Hey @adrianchiris, after some investigation, I've found that the pfNames selector isn't sufficient for our use cases. I need to be able to target the device as it is named within the VM. This change seems to satisfy that.

ian-howell avatar Aug 08 '22 14:08 ian-howell

@e0ne Thanks for the review! I think I've resolved all of your comments. Let me know if there's anything else I need to do.

ian-howell avatar Sep 20 '22 15:09 ian-howell

@ian-howell Is this for VF passthrough to VM or is it also applied to virtio interfaces?

zshi-redhat avatar Sep 21 '22 00:09 zshi-redhat

This is just for VF passthrough

ian-howell avatar Sep 21 '22 19:09 ian-howell

Gentle bump on this one

ian-howell avatar Oct 10 '22 15:10 ian-howell

I will review this PR sorry for the long wait

SchSeba avatar Oct 24 '22 15:10 SchSeba

I will review this PR sorry for the long wait

@SchSeba Any update on this?

ian-howell avatar Nov 29 '22 14:11 ian-howell

Thanks @SchSeba! I've addressed the documentation when you get another chance

ian-howell avatar Dec 19 '22 18:12 ian-howell

/lgtm

SchSeba avatar Dec 20 '22 10:12 SchSeba

@e0ne Could you give this a review when you get a moment? I think I just need one more approval for merge. Thanks in advance!

ian-howell avatar Jan 03 '23 14:01 ian-howell

Not trying to be pushy, but could I please get an update on this when someone finds the time?

ian-howell avatar Jan 09 '23 17:01 ian-howell

Hi Ian, appologies for the late response.

looking at this PR you essentially add a netdevice selector right ? it will select devices based on their netdev name.

this is problematic since sriov-network-device plugin is often used with CNI which means netdevice will be moved to the container network namespace.

if at that point device plugin is restarted, resources that are already allocated to running pods will not be discovered.

in VM, can device selection be done using PCI addresses ?

adrianchiris avatar Jan 10 '23 08:01 adrianchiris

Hey Adrian, thanks for looking over this.

Yes, the goal is to create a netdevice selector. At the time of instantiation of the device-plugin ConfigMap, all that we have is the interface name. This is because we are creating templates of kubevirt machines, and the concrete VM definitions won't come until afterward. We could specify PCI addresses, but due to libvirt limitations, we can't guarantee that what we specified will be what appears in the VM.

We did some testing to get a better understanding of the problem you've pointed out, and sure enough, the resources are lost when the device-plugin restarts. We will continue to look for a solution for this, but if you happen to have any ideas that might help us, I would be greatly appreciative.

ian-howell avatar Jan 17 '23 23:01 ian-howell

@adrianchiris I have a solution which appears to work. Basically, we iterate over the child namespaces, switching as we go, and search within for an interface whose alias matches the interface as it were named in the parent namespace. I've verified that this works on our hardware.

When you get a second, I would appreciate it if you could take a second look at this and let me know what you think. Thanks!

ian-howell avatar Mar 02 '23 22:03 ian-howell

@adrianchiris Just a gentle bump on this one. Please let me know if there's anything wrong with the latest approach

ian-howell avatar Mar 20 '23 16:03 ian-howell

Hey @ian-howell , couple quick questions:

When you say iterate over namespaces, are these pod namespaces or host namespaces? SRIOV NDP already operates in the host namespace.

NDP should not be iterating over pod namespaces.

What is the difference between pfName and ifaceName ? only that it is looking for an alias? What would happen if there is more than one alias with the same name? since they could be named similar but in different namespaces.

Eoghan1232 avatar Mar 22 '23 08:03 Eoghan1232

@Eoghan1232

NDP should not be iterating over pod namespaces.

I'm curious why not? I don't think what we're trying to achieve is possible without doing so.

ian-howell avatar Mar 22 '23 15:03 ian-howell

@Eoghan1232

NDP should not be iterating over pod namespaces.

I'm curious why not? I don't think what we're trying to achieve is possible without doing so.

NDP scope should be limited to host network namespace for device discovery on the host.

if NDP starts to snoop over other net namespaces, it is kind of a breach of trust. The network isolation is there for a reason.

This could pose further issues if NDP starts messing with pod network namespaces

Can you explain the use case a bit more? from my understanding, you are essentially creating VM's with kubevirt, which is a K8s pod, and then passing the PF to the pod, then want NDP to manage the VF pools from the host side? i.e. control VF's inside VM from the host.

Eoghan1232 avatar Mar 22 '23 15:03 Eoghan1232

The use case is a bit more involved. We're creating a VM with kubevirt, and within that VM, we're spinning up an inner k8s cluster. So the "PF" in this case is technically a VF within the VM. The only information about that function we have to work with is its name as viewed from within the VM

What I want to do is schedule a pod to the inner k8s cluster, and pass the kubevirt pod's VF into the inner pod.

With our testing, we've found that pfNames is insufficient for this purpose. The pciAddresses selector works, but due to ordering, we can't know the PCI address of the kubevirt pod's VF until after it's running, which is too late.

ian-howell avatar Mar 22 '23 20:03 ian-howell

Hi @ian-howell , we discussed this again in a meeting today.

would you be able to join a community meeting on the 4th April - link to the meeting doc, there is a zoom link at the top. The meeting takes place at 14:00 UTC.

It would be easier to have a discussion over a call on this :)

Eoghan1232 avatar Mar 27 '23 15:03 Eoghan1232

would you be able to join a community meeting on the 4th April - link to the meeting doc, there is a zoom link at the top. The meeting takes place at 14:00 UTC.

I will be present.

ian-howell avatar Mar 28 '23 13:03 ian-howell

Would it be possible/acceptable to implement this function: https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin/blob/abf6548c33286a1766586704c26c425e0206f1f1/pkg/resources/pool_stub.go#L74-L78

If I'm understanding correctly, this could be used to periodically refresh the device list, which would allow my original implementation (without the namespace-switching) to work. It also looks like this is functionality that may be useful to others as well. I believe a good implementation could partially satisfy https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin/issues/276 and provide boilerplate for https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin/issues/362

ian-howell avatar Mar 29 '23 20:03 ian-howell

Hi @ian-howell , apologies, I gave the wrong time, it's 13:00 UTC, our doc must be outdated, due to timezone change. the meeting is in progress if you would still like to join.

Eoghan1232 avatar Apr 04 '23 13:04 Eoghan1232

Hi @ian-howell ,

we discussed the new proposal you mentioned above to use the Probe func, to solve the use case. We need to see what the original intention by Kuberentes was for this probe method and perhaps see what other device plugins have done. you can also see the discuss notes in the google doc that was linked previously. link again

would you be able to join the Tuesday April 18th 13:00 UTC ? :)

Eoghan1232 avatar Apr 04 '23 13:04 Eoghan1232

Hi @Eoghan1232, looks like I might have just missed the meeting. Thanks for keeping this on the agenda, and yes, I will plan on attending the meeting on the 18th.

ian-howell avatar Apr 04 '23 14:04 ian-howell

Hi @ian-howell I contact the kubevirt team.

I wanted to ask you if you can try to add apci config to the VM definition

interfaces:
      - masquerade: {}
        name: default
      - bridge: {}
        name: br10
        acpiIndex: 101

Then check `cat /sys/bus/pci/devices/<PCI>/acpi_index

if this exist on your node we can implement a new acpi selector that will always work :)

SchSeba avatar Apr 19 '23 13:04 SchSeba

second option we can use today kubevirt already support to add tags for the VM

- bridge: {}
        name: br10
        tag: blue

and they are using the same API as OpenStack so I think with a minimal change in the sriov-operator we can support also tags in the virtual plugin we already have

https://github.com/k8snetworkplumbingwg/sriov-network-operator/blob/master/pkg/utils/utils_virtual.go#L123:6

https://kubevirt.io/user-guide/virtual_machines/startup_scripts/#device-role-tagging

SchSeba avatar Apr 19 '23 13:04 SchSeba

@ian-howell any updates ?

were the alternatives working for you ? (acpi index or vm tags) ?

adrianchiris avatar May 17 '23 08:05 adrianchiris