intel-device-plugins-for-kubernetes icon indicating copy to clipboard operation
intel-device-plugins-for-kubernetes copied to clipboard

GPU plugin README installation section mixes too many things

Open eero-t opened this issue 2 years ago • 4 comments

Current GPU plugin README installation section: https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/gpu_plugin

Is a bit of mess because it mixes many different things, in odd order, which makes installation look a more complicated than it actually is.

IMHO all of these 4 topics should be in separate (high-level) sections:

  • Installing GPU plugin (from pre-built images)
    • Including fractionalization support (with link to [1])
  • Verifying/testing GPU plugin installation
  • [1] Explanation of fractionalization support
  • Building the images from sources (@mythi)
    • This section could be actually be in separate DEVEL.md document, along with "Deploy by hand" instructions
    • E.g. under new "Features" section, or also in separate doc

Additional confusions is source building instructions currently being under weirdly named "Deploying as a DaemonSet" subsection that comes after "Getting the source code" subsection. This is especially confusing as pre-built images also use DaemonSet, it's not something specific to source builds...

Top-3 issues:

  • [x] 1. Move local build instructions to DEVEL.md (@mythi)
  • [ ] 2. Update/move GPU plugin pre-requisities section (@eero-t)
  • [ ] 3. Harmonize installation instructions (TBD)

eero-t avatar Jun 22 '22 09:06 eero-t

@eero-t thanks, these are good suggestions. one thing we need to keep in mind is that all our cmd/*_plugin/READMEs try to follow the same structure so we'd need to come up with improvements that all the plugins could follow

mythi avatar Jun 22 '22 10:06 mythi

One more issue with the GPU plugin installation section is that it mentions nothing about using operator (or helm for) installing it, e.g. link to further documentation elsewhere in the project.

eero-t avatar Jun 22 '22 12:06 eero-t

@eero-t thanks, these are good suggestions. one thing we need to keep in mind is that all our cmd/*_plugin/READMEs try to follow the same structure so we'd need to come up with improvements that all the plugins could follow

@mythi I see, all of them list building the image under "Deploying as DaemonSet" section. IMHO there's nothing really connecting those two specifically, deployments that requiring building (modified sources) should go to documentation that is separate from the end user deployment documentation (or at least in separate top level section). It's bad that there's so prominent documentation on how to install things so that they miss e.g. using NFD annotations.

Main divergence between different plugin READMEs seems to be in how per-requisites are handled. E.g. QAT mentions them in first sub-section under "Installation", but GPU README mentions fractional resource reqs only under "Deploying as DaemonSet".

eero-t avatar Aug 02 '22 11:08 eero-t

One more thing to add to README / requirements is a note about underlying host kernel needing to support given HW (+ that support being enable), and some note of how to install suitable driver packages to the WL containers (see #1109).

When given plugin feature, or NFD label is specific to given HW feature (SR-IOV, tiles...), it would be good to mention required HW series also in the README (and drop HW mentions from root level README, so that they're maintained in a single place).

eero-t avatar Aug 26 '22 12:08 eero-t

@eero-t you added the prerequisites some time ago. Do you think this is still a valid issue? Or can we close this?

tkatila avatar Apr 18 '23 07:04 tkatila

I'll need to update the docs a bit as upstream kernel has progressed further, but this ticket can be closed.

eero-t avatar Apr 18 '23 08:04 eero-t