intel-device-plugins-for-kubernetes
intel-device-plugins-for-kubernetes copied to clipboard
GPU plugin README installation section mixes too many things
Current GPU plugin README installation section: https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/gpu_plugin
Is a bit of mess because it mixes many different things, in odd order, which makes installation look a more complicated than it actually is.
IMHO all of these 4 topics should be in separate (high-level) sections:
- Installing GPU plugin (from pre-built images)
- Including fractionalization support (with link to [1])
- Verifying/testing GPU plugin installation
- [1] Explanation of fractionalization support
- Building the images from sources (@mythi)
- This section could be actually be in separate DEVEL.md document, along with "Deploy by hand" instructions
- E.g. under new "Features" section, or also in separate doc
Additional confusions is source building instructions currently being under weirdly named "Deploying as a DaemonSet" subsection that comes after "Getting the source code" subsection. This is especially confusing as pre-built images also use DaemonSet, it's not something specific to source builds...
Top-3 issues:
- [x] 1. Move local build instructions to DEVEL.md (@mythi)
- [ ] 2. Update/move GPU plugin pre-requisities section (@eero-t)
- [ ] 3. Harmonize installation instructions (TBD)
@eero-t thanks, these are good suggestions. one thing we need to keep in mind is that all our cmd/*_plugin/README
s try to follow the same structure so we'd need to come up with improvements that all the plugins could follow
One more issue with the GPU plugin installation section is that it mentions nothing about using operator (or helm for) installing it, e.g. link to further documentation elsewhere in the project.
@eero-t thanks, these are good suggestions. one thing we need to keep in mind is that all our
cmd/*_plugin/README
s try to follow the same structure so we'd need to come up with improvements that all the plugins could follow
@mythi I see, all of them list building the image under "Deploying as DaemonSet" section. IMHO there's nothing really connecting those two specifically, deployments that requiring building (modified sources) should go to documentation that is separate from the end user deployment documentation (or at least in separate top level section). It's bad that there's so prominent documentation on how to install things so that they miss e.g. using NFD annotations.
Main divergence between different plugin READMEs seems to be in how per-requisites are handled. E.g. QAT mentions them in first sub-section under "Installation", but GPU README mentions fractional resource reqs only under "Deploying as DaemonSet".
One more thing to add to README / requirements is a note about underlying host kernel needing to support given HW (+ that support being enable), and some note of how to install suitable driver packages to the WL containers (see #1109).
When given plugin feature, or NFD label is specific to given HW feature (SR-IOV, tiles...), it would be good to mention required HW series also in the README (and drop HW mentions from root level README, so that they're maintained in a single place).
@eero-t you added the prerequisites some time ago. Do you think this is still a valid issue? Or can we close this?
I'll need to update the docs a bit as upstream kernel has progressed further, but this ticket can be closed.