intel-device-plugins-for-kubernetes icon indicating copy to clipboard operation
intel-device-plugins-for-kubernetes copied to clipboard

dlb: add initcontainer to plugin

Open hj-johannes-lee opened this issue 3 years ago • 15 comments

initcontainer enables vfs and configures vfs

  • the number of pfs and vfs per pf can be configurable by users add dlb-initcontainer kustomize overlay update CRD to have initImage implment operator to run initcontainer update e2e test to run initcontainer overlay pudate envtest to test initimage

Signed-off-by: Hyeongju Johannes Lee [email protected]


TO BE DISCUSSED & IMPROVED:

  1. Do we actually keep the variable "REMAINING_NUM_PFS"? Or find another way?
  2. How to have no 'default' values for variables ("REMAINING_NUM_PFS", "NUM_VFS_PER_PF) in the script, yaml, etc.
  3. This PR does not have any change of CRD for the variables. Do we include here, or later in another PR?

*There will be no more work for this PR until September.

hj-johannes-lee avatar Jul 28 '22 09:07 hj-johannes-lee

initcontainer enables 4 vfs from each pf and configures vfs (this might need to be improved, but for now, I and Mikko agreed to just set static number 4 for each pf.)

This most probably won't work in Simics. I was only able to create one VF per PF there.

bart0sh avatar Jul 28 '22 12:07 bart0sh

This most probably won't work in Simics. I was only able to create one VF per PF there.

:( What is the desirable thing to run with initcontainer image? Do we run with docker? If so, then running image with environment variable is an option...!

Or...

hj-johannes-lee avatar Jul 28 '22 13:07 hj-johannes-lee

What is the desirable thing to run with initcontainer image? Do we run with docker?

CI doesn't have docker as far as I know. It uses either CRI-O or Containerd.

bart0sh avatar Jul 28 '22 13:07 bart0sh

Codecov Report

Merging #1088 (619f5f0) into main (aefdc3f) will decrease coverage by 0.46%. The diff coverage is 21.42%.

:exclamation: Current head 619f5f0 differs from pull request most recent head 142b27b. Consider uploading reports for the commit 142b27b to get more accurate results

@@            Coverage Diff             @@
##             main    #1088      +/-   ##
==========================================
- Coverage   53.01%   52.54%   -0.47%     
==========================================
  Files          40       40              
  Lines        4350     4413      +63     
==========================================
+ Hits         2306     2319      +13     
- Misses       1917     1966      +49     
- Partials      127      128       +1     
Impacted Files Coverage Δ
pkg/controllers/reconciler.go 4.25% <ø> (ø)
pkg/controllers/dlb/controller.go 16.40% <21.42%> (-0.52%) :arrow_down:
pkg/deviceplugin/server.go 85.05% <0.00%> (+1.72%) :arrow_up:

:mega: Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

codecov-commenter avatar Aug 03 '22 01:08 codecov-commenter

e2e fails because pf is not available anymore after vfs are made. What decision would we need to make for this?

hj-johannes-lee avatar Aug 03 '22 11:08 hj-johannes-lee

e2e fails because pf is not available anymore after vfs are made. What decision would we need to make for this?

Why it's not available? e2e test from main works with VF and PF as far as I can see here: https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/test/e2e/dlb/dlb.go#L63:L68

bart0sh avatar Aug 03 '22 11:08 bart0sh

@bart0sh Do you have an idea on the default value for PF and VF? I just set it as 4 and 4, but if you think it's better to set with other numbers, let me change it.!

hj-johannes-lee avatar Aug 08 '22 18:08 hj-johannes-lee

Do you have an idea on the default value for PF and VF?

I'd not use default values for them if possible. It depends on the workload requirements I guess, so I don't see any reasons for setting default values.

bart0sh avatar Aug 08 '22 21:08 bart0sh

@mythi @hj-johannes-lee before merging it we need to ensure we can remove DLB configuration script from CI images.

bart0sh avatar Sep 14 '22 10:09 bart0sh

@mythi @hj-johannes-lee before merging it we need to ensure we can remove DLB configuration script from CI images.

@bart0sh Would the failure of e2e test of DLB be related to what you said before? Well, since you told many times, I am anyway aware that there should be some changes in the simics for dlb, but idk whether it is related to the failure of this e2e test.!

hj-johannes-lee avatar Sep 19 '22 20:09 hj-johannes-lee

@bart0sh Would the failure of e2e test of DLB be related to what you said before?

It's hard to say based on e2e test output. I'd suggest to get the initcontainer logs in the e2e code and include them into the output. This way you can see why did it fail.

bart0sh avatar Sep 19 '22 21:09 bart0sh

@bart0sh When I test in an spr machine, then e2e test successfully runs without any problem.! Thank you for your suggestion, but I already somewhat guess the cause of it is. Can I just confirm one thing before doing as you suggested? Are there two pfs inside the simics, and one of them is used for configuration of vf?

hj-johannes-lee avatar Sep 19 '22 21:09 hj-johannes-lee

Note that the test does not seem to fail due to missing dlb.intel.com resources but because the initcontainer does not run.

mythi avatar Sep 20 '22 04:09 mythi

What about testing on dev-side e2e where I can temporarily disable bmaas-activated part in dev-dedicated instance(s)

okartau avatar Sep 20 '22 04:09 okartau

What about testing on dev-side e2e where I can temporarily disable bmaas-activated part in dev-dedicated instance(s)

makes sense as we will anyway need to test without the setup script.

mythi avatar Sep 20 '22 06:09 mythi

e2e-dlb passed, shellchecked.

hj-johannes-lee avatar Sep 27 '22 14:09 hj-johannes-lee

@hj-johannes-lee Have I understood correctly that initcontainer reconfigures existing VF even if it's configured? Should it skip already configured VFs?

bart0sh avatar Sep 27 '22 20:09 bart0sh

@bart0sh No. If there is vf, then the pf used for configuring vf is not configured again.

dev=$(realpath /sys/bus/pci/drivers/dlb2/????:??:??.0 | head -1)

First, only first PF is the device that is considered to be configured.

if test -w "$SRIOV_NUMVFS_PATH" -a "$(cat "$SRIOV_NUMVFS_PATH")" -eq 0 ; then

Second, if the PF is already configured (if sriov_numvfs != 0), then we skip configuration.

hj-johannes-lee avatar Sep 28 '22 02:09 hj-johannes-lee

One thing stupid in the script is.. that

  1. We can set only one vf because there is "default" values for resources (num_atomic_inflights, num_dir_credits, etc).
  2. There is still 'for' loop that is meant for setting multiple vfs...

hj-johannes-lee avatar Sep 29 '22 13:09 hj-johannes-lee

@hj-johannes-lee before merging this PR I'd like to see it working in dev environment. Please send PR there and make sure all tests succeed.

bart0sh avatar Sep 30 '22 11:09 bart0sh

@bart0sh I think we can merge this

mythi avatar Oct 06 '22 17:10 mythi