intel-device-plugins-for-kubernetes
intel-device-plugins-for-kubernetes copied to clipboard
Use initcontainer to configure DLB devices
Some of the DLB usage scenarios (e.g. DPDK or libdlb using VFs) require additional setup, such as enabling VFs, binding them etc. This should be done in the initcontainer similarly to how it's done in the QAT initcontainer.
This should be done in the initcontainer
as part of this item plan for the image publishing parts
I met the case that the number DLB device(/dev/dlbN
) is random in the container, I use the first DLB device as default and support config, but it is impossible to confirm which random number would be used before config. Could we pin it in the initContainer as well? I really do not want to detect it with ls
.
@daixiang0 Let's discuss this with the following example:
Let's say we have 8 pf devices: /dev/dlb0 - /dev/dlb7. Pods request them as dlb.intel.com/pf resources, work with them and release them after their jobs are done. DLB plugin allocates devices in their order (dlb0, dlb1, dlb2 ... dlb7). At some point all devices are allocated to workloads. The workload that uses /dev/dlb5 terminates and dlb5 becomes allocatable again. When new workload requests dlb.intel.com/pf, /dev/dlb5 device will be allocated for it. How would you propose to pin the devices in this case? Should plugin mark dlb5 as allocatable only if pod requests dlb5? How a pod should request dlb5? What if dlb5 is already allocated, but dlb4 and dlb7 are allocatable? Should a pod wait until dlb5 is allocatable again?
I met the case that the number DLB device(/dev/dlbN) is random in the container
The container app cannot assume it gets some specific N
. We checked this with the DLB team earlier and they confirmed it's possible to specify which /dev/dlbN
the DLB lib uses with a parameter.
I really do not want to detect it with
ls
if the lib init is not able to auto-discover available devices then it must be done beforehand and passed with the cmdline parameter
So now we can not get fake N in the container, right? In most cases, users do not care about the actual N as all they want is only one device. I know that keeps the actual N easy to debug, could we optimize it?
For now, I need to add detect logic to ensure the app work.
I know that keeps the actual N easy to debug, could we optimize it?
I think the best thing to do is to set the provided devices to env variables for the container, something like:
DLB0=/dev/dlbX
DLBx=/dev/dlbY
...
This is similar what we do with QAT. See: https://github.com/intel/intel-device-plugins-for-kubernetes/blob/3c948cc106914c8e7149b6b5e74e05490f4e8a7f/demo/crypto-perf/run-dpdk-test#L13
Would that work?
Another way is to make one VF device and request it. This way you'll always get the same N.
@mythi would it make sense to sort this output to make N a bit more predictable?
@mythi would it make sense to sort this output to make N a bit more predictable?
There's no predictability in how kubelet picks devices in Allocate because the registered device IDs are stored in Go maps and the order of elements in Go maps is random.
Is there an example for C++ to detect it? I can not use bash script in initContainer since it is already a sidecar.
Is there an example for C++ to detect it? I can not use bash script in initContainer since it is already a sidecar.
Are you asking how the proposal I made would work or how to detect devices in /dev/dlbX
?
I ask how to detect devices in /dev/dlbx
using c++ code?
I ask how to detect devices in
/dev/dlbx
using c++ code?
Before going into this question, let's try to conclude what the device plugin can do to make it as easy as possible. Can you comment on the proposal I made? (btw, maybe start with a new issue first because this conversation is not related to this issue)
I agree.
Is it possible to expose an ENV to make applications easily detect?
Yes (see the QAT example)
Could you give a link?
The plugin does not support it yet.
The plugin does not support it yet.
Will it come in Sep.2022?
The plugin does not support it yet.
Will it come in Sep.2022?
Most likely no, unless you are able to contribute the changes. The first thing is to submit an issue about what the ask is so we could get it to our Q4 planning.
There are ways to workaround the gap you are observing so not having it should not be a blocker.