gpu-operator
gpu-operator copied to clipboard
Question: different GPU models in a single host
Is it possible to install multiple GPU models in a single host?
If so, how are details about these models exported? I can see Node's labels to be used but they seem to be assuming a single-model per server:
nvidia.com/cuda.driver.major: "470"
nvidia.com/cuda.driver.minor: "82"
nvidia.com/cuda.driver.rev: "01"
nvidia.com/cuda.runtime.major: "11"
nvidia.com/cuda.runtime.minor: "4"
nvidia.com/gfd.timestamp: "1638347665"
nvidia.com/gpu.compute.major: "7"
nvidia.com/gpu.compute.minor: "5"
nvidia.com/gpu.count: "1"
nvidia.com/gpu.deploy.container-toolkit: "true"
nvidia.com/gpu.deploy.dcgm: "true"
nvidia.com/gpu.deploy.dcgm-exporter: "true"
nvidia.com/gpu.deploy.device-plugin: "true"
nvidia.com/gpu.deploy.driver: "true"
nvidia.com/gpu.deploy.gpu-feature-discovery: "true"
nvidia.com/gpu.deploy.node-status-exporter: "true"
nvidia.com/gpu.deploy.operator-validator: "true"
nvidia.com/gpu.family: turing
nvidia.com/gpu.machine: g4dn.xlarge
nvidia.com/gpu.memory: "15109"
nvidia.com/gpu.present: "true"
nvidia.com/gpu.product: Tesla-T4
nvidia.com/mig.strategy: single
@mareklibra No, we don't support mixed GPU's on same node yet. This is in the roadmap.
@shivamerla How is that expected to be implemented? Asking to be able to design a consuming application in a way to be ready for that change in the future.
You can refer to the early implementation here: https://gitlab.com/nvidia/kubernetes/device-plugin/-/merge_requests/127