gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

Question: different GPU models in a single host

Open mareklibra opened this issue 3 years ago • 3 comments

Is it possible to install multiple GPU models in a single host?

If so, how are details about these models exported? I can see Node's labels to be used but they seem to be assuming a single-model per server:

    nvidia.com/cuda.driver.major: "470"
    nvidia.com/cuda.driver.minor: "82"
    nvidia.com/cuda.driver.rev: "01"
    nvidia.com/cuda.runtime.major: "11"
    nvidia.com/cuda.runtime.minor: "4"
    nvidia.com/gfd.timestamp: "1638347665"
    nvidia.com/gpu.compute.major: "7"
    nvidia.com/gpu.compute.minor: "5"
    nvidia.com/gpu.count: "1"
    nvidia.com/gpu.deploy.container-toolkit: "true"
    nvidia.com/gpu.deploy.dcgm: "true"
    nvidia.com/gpu.deploy.dcgm-exporter: "true"
    nvidia.com/gpu.deploy.device-plugin: "true"
    nvidia.com/gpu.deploy.driver: "true"
    nvidia.com/gpu.deploy.gpu-feature-discovery: "true"
    nvidia.com/gpu.deploy.node-status-exporter: "true"
    nvidia.com/gpu.deploy.operator-validator: "true"
    nvidia.com/gpu.family: turing
    nvidia.com/gpu.machine: g4dn.xlarge
    nvidia.com/gpu.memory: "15109"
    nvidia.com/gpu.present: "true"
    nvidia.com/gpu.product: Tesla-T4
    nvidia.com/mig.strategy: single

mareklibra avatar Apr 14 '22 09:04 mareklibra

@mareklibra No, we don't support mixed GPU's on same node yet. This is in the roadmap.

shivamerla avatar Apr 15 '22 00:04 shivamerla

@shivamerla How is that expected to be implemented? Asking to be able to design a consuming application in a way to be ready for that change in the future.

mareklibra avatar Apr 19 '22 05:04 mareklibra

You can refer to the early implementation here: https://gitlab.com/nvidia/kubernetes/device-plugin/-/merge_requests/127

shivamerla avatar Apr 19 '22 22:04 shivamerla