sriov-network-device-plugin
sriov-network-device-plugin copied to clipboard
Add selectors `nicNames`
This addresses #433 by adding the nicNames
fields as selectors
(8/8/2022 Update): This no longer addresses macAddresses
Pull Request Test Coverage Report for Build 4316249828
- 54 of 118 (45.76%) changed or added relevant lines in 5 files are covered.
- 2 unchanged lines in 2 files lost coverage.
- Overall coverage decreased (-1.8%) to 76.508%
Changes Missing Coverage | Covered Lines | Changed/Added Lines | % |
---|---|---|---|
pkg/factory/factory.go | 0 | 2 | 0.0% |
pkg/netdevice/netDeviceProvider.go | 35 | 37 | 94.59% |
pkg/devices/gen_net.go | 8 | 11 | 72.73% |
pkg/utils/utils.go | 0 | 57 | 0.0% |
<!-- | Total: | 54 | 118 |
Files with Coverage Reduction | New Missed Lines | % |
---|---|---|
pkg/devices/gen_net.go | 1 | 83.54% |
pkg/netdevice/netDeviceProvider.go | 1 | 85.0% |
<!-- | Total: | 2 |
Totals | |
---|---|
Change from base Build 4294051176: | -1.8% |
Covered Lines: | 1941 |
Relevant Lines: | 2537 |
💛 - Coveralls
@ian-howell does PfName selector satisfy your use-case based on community discussion we had on jul 18th ?
if so we can close this PR and related issue
Hey @adrianchiris, after some investigation, I've found that the pfNames
selector isn't sufficient for our use cases. I need to be able to target the device as it is named within the VM. This change seems to satisfy that.
@e0ne Thanks for the review! I think I've resolved all of your comments. Let me know if there's anything else I need to do.
@ian-howell Is this for VF passthrough to VM or is it also applied to virtio interfaces?
This is just for VF passthrough
Gentle bump on this one
I will review this PR sorry for the long wait
I will review this PR sorry for the long wait
@SchSeba Any update on this?
Thanks @SchSeba! I've addressed the documentation when you get another chance
/lgtm
@e0ne Could you give this a review when you get a moment? I think I just need one more approval for merge. Thanks in advance!
Not trying to be pushy, but could I please get an update on this when someone finds the time?
Hi Ian, appologies for the late response.
looking at this PR you essentially add a netdevice selector right ? it will select devices based on their netdev name.
this is problematic since sriov-network-device plugin is often used with CNI which means netdevice will be moved to the container network namespace.
if at that point device plugin is restarted, resources that are already allocated to running pods will not be discovered.
in VM, can device selection be done using PCI addresses ?
Hey Adrian, thanks for looking over this.
Yes, the goal is to create a netdevice selector. At the time of instantiation of the device-plugin ConfigMap, all that we have is the interface name. This is because we are creating templates of kubevirt machines, and the concrete VM definitions won't come until afterward. We could specify PCI addresses, but due to libvirt limitations, we can't guarantee that what we specified will be what appears in the VM.
We did some testing to get a better understanding of the problem you've pointed out, and sure enough, the resources are lost when the device-plugin restarts. We will continue to look for a solution for this, but if you happen to have any ideas that might help us, I would be greatly appreciative.
@adrianchiris I have a solution which appears to work. Basically, we iterate over the child namespaces, switching as we go, and search within for an interface whose alias matches the interface as it were named in the parent namespace. I've verified that this works on our hardware.
When you get a second, I would appreciate it if you could take a second look at this and let me know what you think. Thanks!
@adrianchiris Just a gentle bump on this one. Please let me know if there's anything wrong with the latest approach
Hey @ian-howell , couple quick questions:
When you say iterate over namespaces, are these pod namespaces or host namespaces? SRIOV NDP already operates in the host namespace.
NDP should not be iterating over pod namespaces.
What is the difference between pfName and ifaceName ? only that it is looking for an alias? What would happen if there is more than one alias with the same name? since they could be named similar but in different namespaces.
@Eoghan1232
NDP should not be iterating over pod namespaces.
I'm curious why not? I don't think what we're trying to achieve is possible without doing so.
@Eoghan1232
NDP should not be iterating over pod namespaces.
I'm curious why not? I don't think what we're trying to achieve is possible without doing so.
NDP scope should be limited to host network namespace for device discovery on the host.
if NDP starts to snoop over other net namespaces, it is kind of a breach of trust. The network isolation is there for a reason.
This could pose further issues if NDP starts messing with pod network namespaces
Can you explain the use case a bit more? from my understanding, you are essentially creating VM's with kubevirt, which is a K8s pod, and then passing the PF to the pod, then want NDP to manage the VF pools from the host side? i.e. control VF's inside VM from the host.
The use case is a bit more involved. We're creating a VM with kubevirt, and within that VM, we're spinning up an inner k8s cluster. So the "PF" in this case is technically a VF within the VM. The only information about that function we have to work with is its name as viewed from within the VM
What I want to do is schedule a pod to the inner k8s cluster, and pass the kubevirt pod's VF into the inner pod.
With our testing, we've found that pfNames
is insufficient for this purpose. The pciAddresses
selector works, but due to ordering, we can't know the PCI address of the kubevirt pod's VF until after it's running, which is too late.
Hi @ian-howell , we discussed this again in a meeting today.
would you be able to join a community meeting on the 4th April - link to the meeting doc, there is a zoom link at the top. The meeting takes place at 14:00 UTC.
It would be easier to have a discussion over a call on this :)
would you be able to join a community meeting on the 4th April - link to the meeting doc, there is a zoom link at the top. The meeting takes place at 14:00 UTC.
I will be present.
Would it be possible/acceptable to implement this function: https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin/blob/abf6548c33286a1766586704c26c425e0206f1f1/pkg/resources/pool_stub.go#L74-L78
If I'm understanding correctly, this could be used to periodically refresh the device list, which would allow my original implementation (without the namespace-switching) to work. It also looks like this is functionality that may be useful to others as well. I believe a good implementation could partially satisfy https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin/issues/276 and provide boilerplate for https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin/issues/362
Hi @ian-howell , apologies, I gave the wrong time, it's 13:00 UTC, our doc must be outdated, due to timezone change. the meeting is in progress if you would still like to join.
Hi @ian-howell ,
we discussed the new proposal you mentioned above to use the Probe func, to solve the use case. We need to see what the original intention by Kuberentes was for this probe method and perhaps see what other device plugins have done. you can also see the discuss notes in the google doc that was linked previously. link again
would you be able to join the Tuesday April 18th 13:00 UTC ? :)
Hi @Eoghan1232, looks like I might have just missed the meeting. Thanks for keeping this on the agenda, and yes, I will plan on attending the meeting on the 18th.
Hi @ian-howell I contact the kubevirt team.
I wanted to ask you if you can try to add apci config to the VM definition
interfaces:
- masquerade: {}
name: default
- bridge: {}
name: br10
acpiIndex: 101
Then check `cat /sys/bus/pci/devices/<PCI>/acpi_index
if this exist on your node we can implement a new acpi selector that will always work :)
second option we can use today kubevirt already support to add tags for the VM
- bridge: {}
name: br10
tag: blue
and they are using the same API as OpenStack so I think with a minimal change in the sriov-operator we can support also tags in the virtual plugin we already have
https://github.com/k8snetworkplumbingwg/sriov-network-operator/blob/master/pkg/utils/utils_virtual.go#L123:6
https://kubevirt.io/user-guide/virtual_machines/startup_scripts/#device-role-tagging
@ian-howell any updates ?
were the alternatives working for you ? (acpi index or vm tags) ?