sriov-network-device-plugin icon indicating copy to clipboard operation
sriov-network-device-plugin copied to clipboard

API preferred allocation: GetPreferredAllocation

Open zshi-redhat opened this issue 5 years ago • 27 comments

  1. update kubelet dependency to 1.19.0 which supports GetPreferredAllocation API
  2. add a cli param called --allocate-policy with two options: empty or concentrated policy.
  3. introduce Allocator interface and include it in each resource server
  4. add GetPreferredAllocation API function call in resource server.

Fixes #255

zshi-redhat avatar Sep 08 '20 04:09 zshi-redhat

Concentrate used in the PR may not be a good name for the allocating policy (use one PF resource until exhausted), feel free to propose other names, I will update the patch accordingly.

zshi-redhat avatar Sep 08 '20 04:09 zshi-redhat

/cc @ahalim-intel @martinkennelly @adrianchiris PTAL.

zshi-redhat avatar Sep 10 '20 01:09 zshi-redhat

This RP was tested against 19.0 kubernetes deployment.

zshi-redhat avatar Sep 10 '20 01:09 zshi-redhat

@zshi-redhat Does it make sense to have a different allocating policy per resource pool?

amorenoz avatar Sep 10 '20 07:09 amorenoz

@zshi-redhat Does it make sense to have a different allocating policy per resource pool?

@amorenoz good question! I considered to make the allocate-policy as a per-resource configuration, instead of a cli-param. I think it makes sense. The reason I didn't implement it that way is because I'm not sure if user prefers to have all the resources be allocated with same policy or they'd like to use different policies for resources.

zshi-redhat avatar Sep 10 '20 07:09 zshi-redhat

@zshi-redhat Does it make sense to have a different allocating policy per resource pool?

@amorenoz good question! I considered to make the allocate-policy as a per-resource configuration, instead of a cli-param. I think it makes sense. The reason I didn't implement it that way is because I'm not sure if user prefers to have all the resources be allocated with same policy or they'd like to use different policies for resources.

Maybe @sseetharaman6 can comment?

We could have both: A default policy (cli) and a per-resource one that would override the former.

amorenoz avatar Sep 10 '20 07:09 amorenoz

Thanks for this feature. So, the current implementation of ConcentrateAllocator is based on the order of pci addresses. What other allocation policy we should implement?

ahalimx86 avatar Sep 10 '20 16:09 ahalimx86

Thanks for this feature. So, the current implementation of ConcentrateAllocator is based on the order of pci addresses. What other allocation policy we should implement?

There were two allocation policys I'd like to implement initially:

  1. concentrated (allocating VF from one PF until it's exhausted if multiple PFs are available in one resource pool)
  2. distributed (VF allocation is evenly distributed among PFs if multiple PFs are available in one resource pool)

For the first one, we have an issue associated with it. For the second one, I'm not sure if it's really needed at the moment, I'd like to see if anyone has any requirement of using the distributed policy. Another reason I didn't implement the distributed policy is since kubelet randomly chooses which device to allocate, we probably can just rely on it (although it's not guaranteed to be evenly distributed) instead of implementing our own.

I think the main usage of this API is to recommend the device affinity to kubelet when allocating more than one device. I'd like to see if there is any use case in networking or accelerator area that requires this kind of device affinity. one possible example might be: if

  1. an accelerator device contains both a network function and accelerator function
  2. multiple such devices are available on one node
  3. network functions from multiple devices are exposed as a single resource pool A
  4. accelerator functions from multiple devices are exposed as a single resource pool B

we probably want to allocate network function(pool A) and accelerator function(pool B) from the same device for affinity purpose.

zshi-redhat avatar Sep 11 '20 01:09 zshi-redhat

Concentrate used in the PR may not be a good name for the allocating policy (use one PF resource until exhausted), feel free to propose other names, I will update the patch accordingly.

Some policy naming ideas that came to mind :)

Concentrate -> Packed : VFs are preferred to be allocated in a Packed manner under the same PF Distributed -> Balanced : VFs are preferred to be allocated in a Balanced manner across PFs of the same resource

adrianchiris avatar Sep 13 '20 10:09 adrianchiris

Concentrate used in the PR may not be a good name for the allocating policy (use one PF resource until exhausted), feel free to propose other names, I will update the patch accordingly.

Some policy naming ideas that came to mind :)

Concentrate -> Packed : VFs are preferred to be allocated in a Packed manner under the same PF Distributed -> Balanced : VFs are preferred to be allocated in a Balanced manner across PFs of the same resource

Renamed concentrated to packed. Distributed is not yet included in this PR, I will use the new Balanced name when implementing it.

zshi-redhat avatar Sep 14 '20 02:09 zshi-redhat

Thanks for this PR @zshi-redhat . In the original RFE I filed, I only have the need for a common allocatePolicy . However, per-resource allocate-policy is definitely nice to have. I can see how it will be helpful, but maybe instead of cli-param, this can come from /etc/pcidp/config.json as well? Also I wanted to clarify thatGetPreferredAllocate is only compatible with v1.19.0 and later kubernetes deployments .. so in a cluster running an earlier version, this would be no-op.

sseetharaman6 avatar Sep 24 '20 19:09 sseetharaman6

Thanks for this PR @zshi-redhat . In the original RFE I filed, I only have the need for a common allocatePolicy . However, per-resource allocate-policy is definitely nice to have. I can see how it will be helpful, but maybe instead of cli-param, this can come from /etc/pcidp/config.json as well?

Changed the allocatePolicy as a per resource pool config.

Also I wanted to clarify thatGetPreferredAllocate is only compatible with v1.19.0 and later kubernetes deployments .. so in a cluster running an earlier version, this would be no-op.

Yes, it shall only works with v1.19.0 and onwards, did you see any error when running this sriovdp with old k8s version?

zshi-redhat avatar Sep 28 '20 04:09 zshi-redhat

I tested this patch on ver. 1.19.3 of Kubernetes. I tested it on multiple VFs with the packed option. It worked as expected enabled and disabled. Thanks again @zshi-redhat for this.

martinkennelly avatar Oct 22 '20 15:10 martinkennelly

@zshi-redhat Do you think adding documentation for this would be good too to advertise this feature?

martinkennelly avatar Nov 23 '20 12:11 martinkennelly

@zshi-redhat Do you think adding documentation for this would be good too to advertise this feature?

Yes, I will add doc for the API change, Thanks Martin!

zshi-redhat avatar Nov 24 '20 10:11 zshi-redhat

@zshi-redhat do you envision we also add the Balanced functionality to this patch? If you need help, let me know.

martinkennelly avatar Dec 15 '20 16:12 martinkennelly

resolved merge conflict and updated the Readme.doc

zshi-redhat avatar Dec 16 '20 05:12 zshi-redhat

@zshi-redhat do you envision we also add the Balanced functionality to this patch? If you need help, let me know.

@martinkennelly I thought about adding balanced functionality, the issue I can see is there seems no reliable way to balance device allocation from device plugin perspective, unless having device plugin record all the allocated/deallocated devices from kubelet. However, I'm unsure about adding the record in device plugin because there is no deallocate call from kubelet.

zshi-redhat avatar Dec 16 '20 05:12 zshi-redhat

@adrianchiris as we discussed in resource mgmt meeting (considering adding balanced policy later w/ solid use case), are you good with current packed allocation policy?

zshi-redhat avatar Jan 29 '21 01:01 zshi-redhat

I wonder if we can get away with no user-facing API changes for resourceList (for now) as we support only one policy. and just have a cli flag that turns on "get preferred allocation" support globally like a feature gate (which can later be enabled by default).

Unless i missed it, I did not find a compelling reason in the conversation to justify (IMHO) having this configurable per resource at this time.

adrianchiris avatar Jan 31 '21 09:01 adrianchiris

I wonder if we can get away with no user-facing API changes for resourceList (for now) as we support only one policy. and just have a cli flag that turns on "get preferred allocation" support globally like a feature gate (which can later be enabled by default).

There is some difference between packed policy and no policy. packed always use the resources "in order" while no policy is random. I prefer not to change the default behaviour from random to packed.

Unless i missed it, I did not find a compelling reason in the conversation to justify (IMHO) having this configurable per resource at this time.

Looking at history, this PR was initially implemented the policy as cli cmd, later changed to per-resource pool configuration per suggestion. I think the use case is the same as user may want to allocate one resource in order, but not the others, because we support resourceList to contain multiple resourcePools in one device plugin instance.

zshi-redhat avatar Feb 01 '21 01:02 zshi-redhat

@zshi-redhat could you rebase this ? I assume this PR is still desired

adrianchiris avatar Mar 31 '21 11:03 adrianchiris

@zshi-redhat could you rebase this ? I assume this PR is still desired

Rebased.

zshi-redhat avatar Apr 01 '21 08:04 zshi-redhat

This pull request fixes 1 alert when merging 78530f49dd8268703b2d8dc6da867163d8a5ab1c into 1a8c9df70cd40ad097fc7e0dbfc542f7d58f7e3b - view on LGTM.com

fixed alerts:

  • 1 for Useless assignment to local variable

lgtm-com[bot] avatar Apr 12 '21 01:04 lgtm-com[bot]

@adrianchiris Could you find time to review this? It's rebased.

martinkennelly avatar Jul 22 '21 09:07 martinkennelly

I’m planning to migrate our in-house CNI plugin to SR-IOV CNI plugin with this device plugin. While evaluating this device plugin, I found the plugin doesn’t support allocating VFs from different PFs (balanced policy). The balanced policy allows us to assign multi PFs to a Pod to achieve higher bandwidth if the communication stack supports multi-rail network configuration.

I have two questions,

  • Do you agree with adding GetPreferredAllocation API to SR-IOV device plugin? If so, I’d like to prepare a new PR based on this work by rearranging and rebasing the commits. The new PR would be much smaller and easier to review.
  • Do you agree with adding a Balanced Policy to the SR-IOV device plugin?

wattmto avatar Aug 17 '22 06:08 wattmto

@adrianchiris @martinkennelly @zshi-redhat Hi guys! I would appreciate if you could comment on https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin/pull/267#issuecomment-1217524548 if possible😊

superbrothers avatar Aug 25 '22 09:08 superbrothers

replaced by #443

adrianchiris avatar Nov 21 '23 15:11 adrianchiris