sriov-network-device-plugin
sriov-network-device-plugin copied to clipboard
Allow mixing RDMA and non-RDMA devices in one pool
What would you like to be added?
Currently, RDMA devices must use the isRdma: true
selector to have their RDMA resources mounted into a pod. This selector (to my understanding) prevents non-RDMA devices from being included in a resource pool with RDMA devices. So, from the user perspective, this selector has the dual responsibility of ensuring RDMA resources are added to containers using RDMA devices and filtering devices which are a part of a resource pool based on their RDMA capabilities.
I propose adding the ability to mix RDMA and non-RDMA devices into the same resource pool.
What is the use case for this feature / enhancement?
An example use case is mixing Intel and Mellanox VFs for DPDK applications. DPDK allows applications to use both of those VFs without any application code changes. Therefore, DPDK applications don't need to care what kind of VF they run on, and should be able to request a DPDK compatible VF. Instead, due to the boolean nature of the isRdma
selector, the application must request either an Intel VF or Mellanox VF.
Implementation
I've put together a patch (upstreaming is awaiting internal review) that adds an auto
value to the isRdma
selector. When isRdma
is set to auto
, then RDMA and non-RDMA devices can be added to the same resource pool. When an RDMA-capable device (like a Mellanox VF) is requested, then the RDMA resources are attached to the pod. When a non-RDMA device is allocated to a pod, then whatever normal resources are attached to the pod. I've tested the patch against an Intel X710 VF and a Mellanox Connect X-5 VF without noticeable issues.
I am not sure how much backward compatibility this project aims to provide. By moving isRdma
from a bool
to string
, I am not sure if some existing configurations that contain isRdma: true
or isRdma: false
will need to be updated or not. If backwards compatibility is a higher concern, then perhaps it will be better to add an additional flag rather than change the type of the existing one.
Questions
Is the use case presented a valid use case? Is the implementation approach mentioned acceptable, or would another way be preferred?
Hi @nlbrown2, thanks for the proposal! I raised this issue in the community meeting today, we can discuss further here.
I have a concern with changing the type from bool to string, as you mentioned, it will no longer be backwards compatible. It will also break any current configs due to us changing a value type.
This change will probably require crd changes in network operator.
I guess if the user requires the old current implementation, they can simply set to "true" or false"
Hi @Eoghan1232 , thanks for your response! It sounds like the community is in favor of implementing this idea, which I am excited to hear!
I am not very familiar with how Go parses JSON, but my suspicion is that existing configs which use isRdma: true
or isRdma: false
will have to be updated. If it is possible to not require this, then retaining compatibility for old configs should also be possible.
Another option for backwards compatibility is to instead create a new configuration flag, such as AutoMountRdma: bool
. When this new flag is set, then RDMA capable devices will have their resources mounted, even if isRdma
is false
. AutoMountRdma
can default to false
to retain behavioral compatibility with old configs. However, I am wary of introducing a new configuration option when a relevant existing configuration option exists. Plus, the interaction between isRdma
and AutoMountRdma
won't be very intuitive without reading relevant documentation.
If we have the freedom to require users to update their config files, then in the PR we can start a discussion around what values isRdma
should take. We aren't bound to supporting "true"
and "false"
, and those values may not make much sense for a non-boolean variable. There will certainly be values that behave the same as true
and false
do currently - maybe "yes"
and "no"
?
For the network operator changes, is there any special procedure for implementing a change like this which needs to impact both repositories?
Hi @nlbrown2, would you be able to join a community meeting to discuss this further ?
The next meeting is held on : 4th October 9am EST
Hi,
I am happy to join that meeting. Look forward to discussing with you all there!
Hi @nlbrown2 any update on this? what was the outcome of the community meeting
Hi @SchSeba , @Eoghan1232 and I had a short discussion, which we then took offline. Eoghan had proposed some workarounds to enable this functionality, as he was working on a patch that would also impact the RDMA implementation. My project's requirements had changed such that this feature wasn't essential, so I wasn't able to test out Eoghan's proposal.
He had proposed using an admission controller to modify the pod spec based upon a per-node label. While it fit the use case I had in mind, I feel the more general solution will require changes to the SR-IOV Net Device Plugin.
I am still interested in enabling this functionality, but I'm not sure what Eoghan's status is, or its impact on this idea.
ok thanks for the update!