gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

Driver init fails in air-gapped clusters due to hard-coded mount of Red Hat subscription repo config

Open changhyuni opened this issue 6 months ago • 4 comments

Summary

When deploying GPU Operator in an air-gapped (offline) cluster the nvidia-driver-daemonset init container fails to start.
Root cause: the driver image ships with a public YUM repo enabled by default, which triggers yum errors in offline environments.
Additionally, the pod spec tries to mount /etc/yum.repos.d/redhat.repo (HostPath, File) but the file is absent, so the kubelet rejects the volume with hostPath type check failed.

Node OS: RHEL 8.10

We rebuilt the driver image and **removed /etc/yum.repos.d/redhat.repo

https://github.com/NVIDIA/gpu-operator/blob/cc4abab0625974987692aec6604a9359f11c5043/internal/state/driver_volumes.go#L203

Related issue

  • https://github.com/NVIDIA/gpu-operator/issues/980

changhyuni avatar Jun 24 '25 22:06 changhyuni