intel-device-plugins-for-kubernetes
intel-device-plugins-for-kubernetes copied to clipboard
dlb: add initcontainer to plugin
initcontainer enables vfs and configures vfs
- the number of pfs and vfs per pf can be configurable by users add dlb-initcontainer kustomize overlay update CRD to have initImage implment operator to run initcontainer update e2e test to run initcontainer overlay pudate envtest to test initimage
Signed-off-by: Hyeongju Johannes Lee [email protected]
TO BE DISCUSSED & IMPROVED:
- Do we actually keep the variable "REMAINING_NUM_PFS"? Or find another way?
- How to have no 'default' values for variables ("REMAINING_NUM_PFS", "NUM_VFS_PER_PF) in the script, yaml, etc.
- This PR does not have any change of CRD for the variables. Do we include here, or later in another PR?
*There will be no more work for this PR until September.
initcontainer enables 4 vfs from each pf and configures vfs (this might need to be improved, but for now, I and Mikko agreed to just set static number 4 for each pf.)
This most probably won't work in Simics. I was only able to create one VF per PF there.
This most probably won't work in Simics. I was only able to create one VF per PF there.
:( What is the desirable thing to run with initcontainer image? Do we run with docker? If so, then running image with environment variable is an option...!
Or...
What is the desirable thing to run with initcontainer image? Do we run with docker?
CI doesn't have docker as far as I know. It uses either CRI-O or Containerd.
Codecov Report
Merging #1088 (619f5f0) into main (aefdc3f) will decrease coverage by
0.46%. The diff coverage is21.42%.
:exclamation: Current head 619f5f0 differs from pull request most recent head 142b27b. Consider uploading reports for the commit 142b27b to get more accurate results
@@ Coverage Diff @@
## main #1088 +/- ##
==========================================
- Coverage 53.01% 52.54% -0.47%
==========================================
Files 40 40
Lines 4350 4413 +63
==========================================
+ Hits 2306 2319 +13
- Misses 1917 1966 +49
- Partials 127 128 +1
| Impacted Files | Coverage Δ | |
|---|---|---|
| pkg/controllers/reconciler.go | 4.25% <ø> (ø) |
|
| pkg/controllers/dlb/controller.go | 16.40% <21.42%> (-0.52%) |
:arrow_down: |
| pkg/deviceplugin/server.go | 85.05% <0.00%> (+1.72%) |
:arrow_up: |
:mega: Codecov can now indicate which changes are the most critical in Pull Requests. Learn more
e2e fails because pf is not available anymore after vfs are made. What decision would we need to make for this?
e2e fails because pf is not available anymore after vfs are made. What decision would we need to make for this?
Why it's not available? e2e test from main works with VF and PF as far as I can see here: https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/test/e2e/dlb/dlb.go#L63:L68
@bart0sh Do you have an idea on the default value for PF and VF? I just set it as 4 and 4, but if you think it's better to set with other numbers, let me change it.!
Do you have an idea on the default value for PF and VF?
I'd not use default values for them if possible. It depends on the workload requirements I guess, so I don't see any reasons for setting default values.
@mythi @hj-johannes-lee before merging it we need to ensure we can remove DLB configuration script from CI images.
@mythi @hj-johannes-lee before merging it we need to ensure we can remove DLB configuration script from CI images.
@bart0sh Would the failure of e2e test of DLB be related to what you said before? Well, since you told many times, I am anyway aware that there should be some changes in the simics for dlb, but idk whether it is related to the failure of this e2e test.!
@bart0sh Would the failure of e2e test of DLB be related to what you said before?
It's hard to say based on e2e test output. I'd suggest to get the initcontainer logs in the e2e code and include them into the output. This way you can see why did it fail.
@bart0sh When I test in an spr machine, then e2e test successfully runs without any problem.! Thank you for your suggestion, but I already somewhat guess the cause of it is. Can I just confirm one thing before doing as you suggested? Are there two pfs inside the simics, and one of them is used for configuration of vf?
Note that the test does not seem to fail due to missing dlb.intel.com resources but because the initcontainer does not run.
What about testing on dev-side e2e where I can temporarily disable bmaas-activated part in dev-dedicated instance(s)
What about testing on dev-side e2e where I can temporarily disable bmaas-activated part in dev-dedicated instance(s)
makes sense as we will anyway need to test without the setup script.
e2e-dlb passed, shellchecked.
@hj-johannes-lee Have I understood correctly that initcontainer reconfigures existing VF even if it's configured? Should it skip already configured VFs?
@bart0sh No. If there is vf, then the pf used for configuring vf is not configured again.
dev=$(realpath /sys/bus/pci/drivers/dlb2/????:??:??.0 | head -1)
First, only first PF is the device that is considered to be configured.
if test -w "$SRIOV_NUMVFS_PATH" -a "$(cat "$SRIOV_NUMVFS_PATH")" -eq 0 ; then
Second, if the PF is already configured (if sriov_numvfs != 0), then we skip configuration.
One thing stupid in the script is.. that
- We can set only one vf because there is "default" values for resources (num_atomic_inflights, num_dir_credits, etc).
- There is still 'for' loop that is meant for setting multiple vfs...
@hj-johannes-lee before merging this PR I'd like to see it working in dev environment. Please send PR there and make sure all tests succeed.
@bart0sh I think we can merge this