gpu-operator 550.90.07-5.15.0-1061-gke-ubuntu22.04 image tag not found when installing with `driver.usePrecompiled` on GKE

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

Important Note: NVIDIA AI Enterprise customers can get support from NVIDIA Enterprise support. Please open a case here.

1. Quick Debug Information

OS/Version(e.g. RHEL8.6, Ubuntu22.04): Ubuntu 22.04
Kernel Version: 5.15.0-1061-gke
Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): containerd
K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS): GKE
GPU Operator Version: 24.6.1

2. Issue or feature description

When the operator is installed with driver.usePrecompiled: true on GKE, the nvidia-driver-daemonset-5.15.0-1061-gke-ubuntu22.04 DaemonSet fails to start because the image tag 550.90.07-5.15.0-1061-gke-ubuntu22.04 cannot be found in the nvcr.io/nvidia/driver repository.

3. Steps to reproduce the issue

Install the v24.6.1 operator on GKE with the following in your values file.

driver:
  enabled: true
  usePrecompiled: true

See the operator spawns a DaemonSet called nvidia-driver-daemonset-5.15.0-1061-gke-ubuntu22.04
See the image defined in the DaemonSet cannot be pulled.

Aug 15 '24 13:08 chipzoller

Yes, I understand per the GKE docs here that drivers must be separately installed and from here that driver installation must be disabled via the operator, but it seems like there should be a validation check in the operator preventing installation if an image tag doesn't exist.

Aug 15 '24 13:08 chipzoller

@chipzoller there are no precompiled driver packages for the gke kernels which is why we do not have any precompiled container images for Ubuntu 22.04 + gke kernel variant. If you want the GPU Operator to deploy and manage the lifecycle of the driver, you will need to use the non-precompiled images.

Aug 21 '24 18:08 cdesiniotis

Hi @cdesiniotis, yes I get that, just stating with this issue that there isn't any mechanism to prevent users from hitting this situation. My recommendation is some template logic which blocks this condition so the chart fails to deploy rather than happily being deployed only for some component to fail to come up due to an unavailable tag.

Aug 21 '24 18:08 chipzoller

@chipzoller the kernel version, and thus the precompiled driver image tag, is not known until runtime. The gpu-operator constructs the image tag from the OS name + kernel version running on the GPU node -- it gets the needed information from node labels added by Node Feature Discovery. I don't believe this is something we can easily validate at the point in time when the chart is installed.

Aug 21 '24 19:08 cdesiniotis

You should be able to use the Helm lookup() function to retrieve a node's labels and then fail conditionally. This would have some potential negative implications, however, as some tools don't support this including some cloud vendor marketplace catalogs if I recall. An alternative could be to fail in the operator container and print the relevant message rather than template a resource with an invalid image tag. If none of those seem like viable options, feel free to close this as not planned. Just throwing some ideas out there that may help others.

Aug 21 '24 19:08 chipzoller

An alternative could be to fail in the operator container and print the relevant message rather than template a resource with an invalid image tag.

This seems like the most reasonable option if we wanted to fail earlier.

Aug 22 '24 21:08 cdesiniotis

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed. To skip these checks, apply the "lifecycle/frozen" label.

Nov 04 '25 22:11 github-actions[bot]

This issue has been open for over 90 days without recent updates, and the context may now be outdated.

Given that gpu-operator 24.6.1 is EOL now, I would encourage you to try latest version and see if you still see this issue.

If this issue is still relevant with the latest version of the NVIDIA GPU Operator, please feel free to reopen it or open a new one with updated details.

Nov 14 '25 05:11 rahulait

gpu-operator gpu-operator copied to clipboard

550.90.07-5.15.0-1061-gke-ubuntu22.04 image tag not found when installing with `driver.usePrecompiled` on GKE

1. Quick Debug Information

2. Issue or feature description

3. Steps to reproduce the issue

gpu-operator
gpu-operator copied to clipboard