gpu-operator
gpu-operator copied to clipboard
Driver init fails in air-gapped clusters due to hard-coded mount of Red Hat subscription repo config
Summary
When deploying GPU Operator in an air-gapped (offline) cluster the nvidia-driver-daemonset init container fails to start.
Root cause: the driver image ships with a public YUM repo enabled by default, which triggers yum errors in offline environments.
Additionally, the pod spec tries to mount /etc/yum.repos.d/redhat.repo (HostPath, File) but the file is absent, so the kubelet rejects the volume with hostPath type check failed.
Node OS: RHEL 8.10
We rebuilt the driver image and **removed /etc/yum.repos.d/redhat.repo
https://github.com/NVIDIA/gpu-operator/blob/cc4abab0625974987692aec6604a9359f11c5043/internal/state/driver_volumes.go#L203
Related issue
- https://github.com/NVIDIA/gpu-operator/issues/980