gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

GPU already used, showing up in multiple containers

Open astranero opened this issue 1 year ago • 1 comments
trafficstars

I have issue with nvidia-gpu-operator, where when setting limits for "nvidia.com/gpu: 1". I get scheduled with a GPU that is already allocated to another container. Additionally, I had previously troubles with containers showing one additional GPU even though limit was set to 1.

What I want: Only allocate a GPU that is not already in use by another pod. What it does: Allocates a GPU that is already in use by another pod.

Environment: GPU model H100, NVIDIA-SMI 550.90.12 , Driver Version: 550.90.12 , CUDA Version: 12.4

Installation steps:

  1. Installing gpu-operator resources
microk8s.helm3 install gpu-operator -n gpu-operator-resources --create-namespace   nvidia/gpu-operator --version v24.6.1   --set toolkit.env[0].name=CONTAINERD_CONFIG   --set toolkit.env[0].value=/var/snap/microk8s/current/args/containerd-template.toml   --set toolkit.env[1].name=CONTAINERD_SOCKET   --set toolkit.env[1].value=/var/snap/microk8s/common/run/containerd.sock   --set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS   --set toolkit.env[2].value=nvidia   --set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT   --set-string toolkit.env[3].value=true --set cdi.default=false --set cdi.enabled=true --set toolkit.enabled=true --set driver.enabled=false
  1. Patching CDI manually
kubectl patch clusterpolicies.nvidia.com/cluster-policy --type='json' \
    -p='[{"op": "replace", "path": "/spec/cdi/default", "value":true}]'
  1. Removing default runtime to restrict giving two GPUs
vi /var/snap/microk8s/current/args/containerd-template.toml
default_runtime_name = "nvidia"   # REMOVED THIS

Would appreciate any help I can get, thank you

astranero avatar Sep 30 '24 10:09 astranero

I ended up exposing all GPUs to unprivileged containers, then stopped using "nvidia.com/gpu" resources to remove additional GPUs that gets allocated randomly by device plugin. However, it would be great if there was a solution that had more granular approach to allocating GPUs. I do not want to use "nvidia.com/gpu", because of GPU locality issue, if I set a "nvidia.com/gpu = 2" will it respect locality and allocate GPUs that have NVLINK interconnects?

microk8s helm install nvidia/gpu-operator --generate-name -n gpu-operator-resources --version 24.6.1 $HELM_OPTIONS \
  --set toolkit.env[0].name=CONTAINERD_CONFIG \
  --set toolkit.env[0].value=/var/snap/microk8s/current/args/containerd-template.toml \
  --set toolkit.env[1].name=CONTAINERD_SOCKET \
  --set toolkit.env[1].value=/var/snap/microk8s/common/run/containerd.sock \
  --set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS \
  --set toolkit.env[2].value=nvidia \
  --set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT \
  --set-string toolkit.env[3].value=true \
  --set toolkit.env[4].name=ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED \
  --set-string toolkit.env[4].value=true \
  --set toolkit.env[5].name=ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS \
  --set-string toolkit.env[5].value=false \
  --set devicePlugin.env[0].name=DEVICE_LIST_STRATEGY \
  --set devicePlugin.env[0].value="envvar" \
  --set driver.enabled=false

astranero avatar Oct 09 '24 08:10 astranero

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed. To skip these checks, apply the "lifecycle/frozen" label.

github-actions[bot] avatar Nov 04 '25 22:11 github-actions[bot]

when setting limits for "nvidia.com/gpu: 1". I get scheduled with a GPU that is already allocated to another container.

This should not be the case unless A) you have configured timeslicing with the k8s-device-plugin (the plugin oversubscribes each physical GPU so that multiple pods can share the same GPU), or B) you have some pods requesting GPUs via resource requests / limits and some getting access via the NVIDIA_VISIBLE_DEVICES envvar. The use of the NVIDIA_VISIBLE_DEVICES envvar is discouraged and I would recommend exclusively going through the k8s-device-plugin so that multiple pods do not get access to the same GPU.

if I set a "nvidia.com/gpu = 2" will it respect locality and allocate GPUs that have NVLINK interconnects?

We do attempt to allocate GPUs in the most connected way possible, following this algorithm: https://github.com/NVIDIA/go-gpuallocator/blob/76743f817851ee3a2d1e31b9e2bf08b76be9e940/gpuallocator/besteffort_policy.go#L36.

Closing this issue but please re-open or file a new issue if you have additional questions.

cdesiniotis avatar Nov 14 '25 23:11 cdesiniotis