sriov-network-device-plugin
sriov-network-device-plugin copied to clipboard
API preferred allocation: GetPreferredAllocation
- update kubelet dependency to 1.19.0 which supports GetPreferredAllocation API
- add a cli param called --allocate-policy with two options: empty or concentrated policy.
- introduce Allocator interface and include it in each resource server
- add GetPreferredAllocation API function call in resource server.
Fixes #255
Concentrate used in the PR may not be a good name for the allocating policy (use one PF resource until exhausted), feel free to propose other names, I will update the patch accordingly.
/cc @ahalim-intel @martinkennelly @adrianchiris PTAL.
This RP was tested against 19.0 kubernetes deployment.
@zshi-redhat Does it make sense to have a different allocating policy per resource pool?
@zshi-redhat Does it make sense to have a different allocating policy per resource pool?
@amorenoz good question! I considered to make the allocate-policy as a per-resource configuration, instead of a cli-param. I think it makes sense. The reason I didn't implement it that way is because I'm not sure if user prefers to have all the resources be allocated with same policy or they'd like to use different policies for resources.
@zshi-redhat Does it make sense to have a different allocating policy per resource pool?
@amorenoz good question! I considered to make the allocate-policy as a per-resource configuration, instead of a cli-param. I think it makes sense. The reason I didn't implement it that way is because I'm not sure if user prefers to have all the resources be allocated with same policy or they'd like to use different policies for resources.
Maybe @sseetharaman6 can comment?
We could have both: A default policy (cli) and a per-resource one that would override the former.
Thanks for this feature. So, the current implementation of ConcentrateAllocator is based on the order of pci addresses. What other allocation policy we should implement?
Thanks for this feature. So, the current implementation of
ConcentrateAllocatoris based on the order of pci addresses. What other allocation policy we should implement?
There were two allocation policys I'd like to implement initially:
- concentrated (allocating VF from one PF until it's exhausted if multiple PFs are available in one resource pool)
- distributed (VF allocation is evenly distributed among PFs if multiple PFs are available in one resource pool)
For the first one, we have an issue associated with it. For the second one, I'm not sure if it's really needed at the moment, I'd like to see if anyone has any requirement of using the distributed policy. Another reason I didn't implement the distributed policy is since kubelet randomly chooses which device to allocate, we probably can just rely on it (although it's not guaranteed to be evenly distributed) instead of implementing our own.
I think the main usage of this API is to recommend the device affinity to kubelet when allocating more than one device. I'd like to see if there is any use case in networking or accelerator area that requires this kind of device affinity. one possible example might be: if
- an accelerator device contains both a network function and accelerator function
- multiple such devices are available on one node
- network functions from multiple devices are exposed as a single resource pool A
- accelerator functions from multiple devices are exposed as a single resource pool B
we probably want to allocate network function(pool A) and accelerator function(pool B) from the same device for affinity purpose.
Concentrate used in the PR may not be a good name for the allocating policy (use one PF resource until exhausted), feel free to propose other names, I will update the patch accordingly.
Some policy naming ideas that came to mind :)
Concentrate -> Packed : VFs are preferred to be allocated in a Packed manner under the same PF Distributed -> Balanced : VFs are preferred to be allocated in a Balanced manner across PFs of the same resource
Concentrate used in the PR may not be a good name for the allocating policy (use one PF resource until exhausted), feel free to propose other names, I will update the patch accordingly.
Some policy naming ideas that came to mind :)
Concentrate -> Packed : VFs are preferred to be allocated in a Packed manner under the same PF Distributed -> Balanced : VFs are preferred to be allocated in a Balanced manner across PFs of the same resource
Renamed concentrated to packed. Distributed is not yet included in this PR, I will use the new Balanced name when implementing it.
Thanks for this PR @zshi-redhat .
In the original RFE I filed, I only have the need for a common allocatePolicy . However, per-resource allocate-policy is definitely nice to have. I can see how it will be helpful, but maybe instead of cli-param, this can come from /etc/pcidp/config.json as well?
Also I wanted to clarify thatGetPreferredAllocate is only compatible with v1.19.0 and later kubernetes deployments .. so in a cluster running an earlier version, this would be no-op.
Thanks for this PR @zshi-redhat . In the original RFE I filed, I only have the need for a common
allocatePolicy. However, per-resource allocate-policy is definitely nice to have. I can see how it will be helpful, but maybe instead of cli-param, this can come from/etc/pcidp/config.jsonas well?
Changed the allocatePolicy as a per resource pool config.
Also I wanted to clarify that
GetPreferredAllocateis only compatible with v1.19.0 and later kubernetes deployments .. so in a cluster running an earlier version, this would be no-op.
Yes, it shall only works with v1.19.0 and onwards, did you see any error when running this sriovdp with old k8s version?
I tested this patch on ver. 1.19.3 of Kubernetes. I tested it on multiple VFs with the packed option. It worked as expected enabled and disabled. Thanks again @zshi-redhat for this.
@zshi-redhat Do you think adding documentation for this would be good too to advertise this feature?
@zshi-redhat Do you think adding documentation for this would be good too to advertise this feature?
Yes, I will add doc for the API change, Thanks Martin!
@zshi-redhat do you envision we also add the Balanced functionality to this patch? If you need help, let me know.
resolved merge conflict and updated the Readme.doc
@zshi-redhat do you envision we also add the Balanced functionality to this patch? If you need help, let me know.
@martinkennelly I thought about adding balanced functionality, the issue I can see is there seems no reliable way to balance device allocation from device plugin perspective, unless having device plugin record all the allocated/deallocated devices from kubelet. However, I'm unsure about adding the record in device plugin because there is no deallocate call from kubelet.
@adrianchiris as we discussed in resource mgmt meeting (considering adding balanced policy later w/ solid use case), are you good with current packed allocation policy?
I wonder if we can get away with no user-facing API changes for resourceList (for now) as we support only one policy.
and just have a cli flag that turns on "get preferred allocation" support globally like a feature gate (which can later be enabled by default).
Unless i missed it, I did not find a compelling reason in the conversation to justify (IMHO) having this configurable per resource at this time.
I wonder if we can get away with no user-facing API changes for
resourceList(for now) as we support only one policy. and just have a cli flag that turns on "get preferred allocation" support globally like a feature gate (which can later be enabled by default).
There is some difference between packed policy and no policy. packed always use the resources "in order" while no policy is random. I prefer not to change the default behaviour from random to packed.
Unless i missed it, I did not find a compelling reason in the conversation to justify (IMHO) having this configurable per resource at this time.
Looking at history, this PR was initially implemented the policy as cli cmd, later changed to per-resource pool configuration per suggestion. I think the use case is the same as user may want to allocate one resource in order, but not the others, because we support resourceList to contain multiple resourcePools in one device plugin instance.
@zshi-redhat could you rebase this ? I assume this PR is still desired
@zshi-redhat could you rebase this ? I assume this PR is still desired
Rebased.
This pull request fixes 1 alert when merging 78530f49dd8268703b2d8dc6da867163d8a5ab1c into 1a8c9df70cd40ad097fc7e0dbfc542f7d58f7e3b - view on LGTM.com
fixed alerts:
- 1 for Useless assignment to local variable
@adrianchiris Could you find time to review this? It's rebased.
I’m planning to migrate our in-house CNI plugin to SR-IOV CNI plugin with this device plugin. While evaluating this device plugin, I found the plugin doesn’t support allocating VFs from different PFs (balanced policy). The balanced policy allows us to assign multi PFs to a Pod to achieve higher bandwidth if the communication stack supports multi-rail network configuration.
I have two questions,
- Do you agree with adding GetPreferredAllocation API to SR-IOV device plugin? If so, I’d like to prepare a new PR based on this work by rearranging and rebasing the commits. The new PR would be much smaller and easier to review.
- Do you agree with adding a Balanced Policy to the SR-IOV device plugin?
@adrianchiris @martinkennelly @zshi-redhat Hi guys! I would appreciate if you could comment on https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin/pull/267#issuecomment-1217524548 if possible😊
replaced by #443