intel-device-plugins-for-kubernetes icon indicating copy to clipboard operation
intel-device-plugins-for-kubernetes copied to clipboard

Use initcontainer to configure DLB devices

Open bart0sh opened this issue 2 years ago • 20 comments

Some of the DLB usage scenarios (e.g. DPDK or libdlb using VFs) require additional setup, such as enabling VFs, binding them etc. This should be done in the initcontainer similarly to how it's done in the QAT initcontainer.

bart0sh avatar Jun 14 '22 21:06 bart0sh

This should be done in the initcontainer

as part of this item plan for the image publishing parts

mythi avatar Jun 15 '22 10:06 mythi

I met the case that the number DLB device(/dev/dlbN) is random in the container, I use the first DLB device as default and support config, but it is impossible to confirm which random number would be used before config. Could we pin it in the initContainer as well? I really do not want to detect it with ls.

daixiang0 avatar Jul 18 '22 01:07 daixiang0

@daixiang0 Let's discuss this with the following example:

Let's say we have 8 pf devices: /dev/dlb0 - /dev/dlb7. Pods request them as dlb.intel.com/pf resources, work with them and release them after their jobs are done. DLB plugin allocates devices in their order (dlb0, dlb1, dlb2 ... dlb7). At some point all devices are allocated to workloads. The workload that uses /dev/dlb5 terminates and dlb5 becomes allocatable again. When new workload requests dlb.intel.com/pf, /dev/dlb5 device will be allocated for it. How would you propose to pin the devices in this case? Should plugin mark dlb5 as allocatable only if pod requests dlb5? How a pod should request dlb5? What if dlb5 is already allocated, but dlb4 and dlb7 are allocatable? Should a pod wait until dlb5 is allocatable again?

bart0sh avatar Jul 18 '22 08:07 bart0sh

I met the case that the number DLB device(/dev/dlbN) is random in the container

The container app cannot assume it gets some specific N. We checked this with the DLB team earlier and they confirmed it's possible to specify which /dev/dlbN the DLB lib uses with a parameter.

I really do not want to detect it with ls

if the lib init is not able to auto-discover available devices then it must be done beforehand and passed with the cmdline parameter

mythi avatar Jul 18 '22 12:07 mythi

So now we can not get fake N in the container, right? In most cases, users do not care about the actual N as all they want is only one device. I know that keeps the actual N easy to debug, could we optimize it?

For now, I need to add detect logic to ensure the app work.

daixiang0 avatar Jul 19 '22 00:07 daixiang0

I know that keeps the actual N easy to debug, could we optimize it?

I think the best thing to do is to set the provided devices to env variables for the container, something like:

DLB0=/dev/dlbX
DLBx=/dev/dlbY
...

This is similar what we do with QAT. See: https://github.com/intel/intel-device-plugins-for-kubernetes/blob/3c948cc106914c8e7149b6b5e74e05490f4e8a7f/demo/crypto-perf/run-dpdk-test#L13

Would that work?

mythi avatar Jul 19 '22 08:07 mythi

Another way is to make one VF device and request it. This way you'll always get the same N.

bart0sh avatar Jul 19 '22 08:07 bart0sh

@mythi would it make sense to sort this output to make N a bit more predictable?

bart0sh avatar Jul 19 '22 08:07 bart0sh

@mythi would it make sense to sort this output to make N a bit more predictable?

There's no predictability in how kubelet picks devices in Allocate because the registered device IDs are stored in Go maps and the order of elements in Go maps is random.

mythi avatar Jul 19 '22 09:07 mythi

Is there an example for C++ to detect it? I can not use bash script in initContainer since it is already a sidecar.

daixiang0 avatar Aug 08 '22 02:08 daixiang0

Is there an example for C++ to detect it? I can not use bash script in initContainer since it is already a sidecar.

Are you asking how the proposal I made would work or how to detect devices in /dev/dlbX?

mythi avatar Aug 08 '22 10:08 mythi

I ask how to detect devices in /dev/dlbx using c++ code?

daixiang0 avatar Aug 09 '22 00:08 daixiang0

I ask how to detect devices in /dev/dlbx using c++ code?

Before going into this question, let's try to conclude what the device plugin can do to make it as easy as possible. Can you comment on the proposal I made? (btw, maybe start with a new issue first because this conversation is not related to this issue)

mythi avatar Aug 09 '22 06:08 mythi

I agree.

daixiang0 avatar Aug 09 '22 06:08 daixiang0

Is it possible to expose an ENV to make applications easily detect?

daixiang0 avatar Aug 09 '22 07:08 daixiang0

Yes (see the QAT example)

mythi avatar Aug 09 '22 12:08 mythi

Could you give a link?

daixiang0 avatar Aug 10 '22 00:08 daixiang0

The plugin does not support it yet.

mythi avatar Aug 10 '22 05:08 mythi

The plugin does not support it yet.

Will it come in Sep.2022?

daixiang0 avatar Aug 15 '22 02:08 daixiang0

The plugin does not support it yet.

Will it come in Sep.2022?

Most likely no, unless you are able to contribute the changes. The first thing is to submit an issue about what the ask is so we could get it to our Q4 planning.

There are ways to workaround the gap you are observing so not having it should not be a blocker.

mythi avatar Aug 15 '22 07:08 mythi