intel-device-plugins-for-kubernetes
intel-device-plugins-for-kubernetes copied to clipboard
Missing GPU plugin documentation items
GPU plugin documentation is missing following things:
- Description / table of the resources provided by GPU plugin ('i915', 'i915_monitor', 'tiles'?) and how they work
- Discussion on user / group ID issues for device access, or just link to: https://kubernetes.io/blog/2021/11/09/non-root-containers-and-devices/
- Link to example deployment YAMLs that use GPU resources, e.g. to the
integpu-job.yamlreferred inREADME.mdTesting and Demos section: https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/demo/intelgpu-job.yaml - Maybe als "Quick start" section, that lists just basics, as current doc is pretty long...
I'd add another item to your list: Simplify GPU plugin deployment options.
I would vote to only have two examples:
- NFD + GPU-plugin with shared-dev-num=1 and monitoring
- Basic use case, should work for most
- NFD + GPU-plugin with shared-dev-num>1, resource management, monitoring and extended resources
- GAS use case, for those who need it
Then I'd add notes about configuration options, using shared-dev-num without GAS etc. into a different file (advanced-deployment.md or similar). As you say, the current README is pretty long.
IMHO "using shared-dev-num without GAS" should be documented as being for "dedicated cluster/nodes with a single GPU workload, where share count equals to how many instances of that workload fit into a single GPU (with required QoS)".
AFAIK other uses for it are non-production ones, so they do not need to be mentioned.