Use gpu-operator v23.9.2 version, when pulling nvcr.io/nvidia/driver:550.54.14 and nvcr.io/nvidia/cloud-native/nvidia-fs:2.17.5, the error "manifest unknown" always seems to be reported "
Hi, I want to install gpu-operator(v23.9.2 version) in my kubernetes cluster, but when pulling nvcr.io/nvidia/driver:550.54.14 and nvcr.io/nvidia/cloud-native/nvidia-fs:2.17.5, the error "manifest unknown" always seems to be reported ", What did I do wrong?
Here is my environment:
- OS/Version: Ubuntu20.04
- Container Runtime Type/Version: Containerd
- K8s Flavor/Version: k8s
- GPU Operator Version: v23.9.2
@sunwuyan the gpu-operator will append an OS suffix to the image tag. For example, the driver image set in the DaemonSet should be nvcr.io/nvidia/driver:550.54.14-ubuntu20.04 if your worker nodes are running Ubuntu 20.04. If the operator is not appending this suffix, please provide logs from the gpu-operator pod.
@sunwuyan the gpu-operator will append an OS suffix to the image tag. For example, the driver image set in the DaemonSet should be
nvcr.io/nvidia/driver:550.54.14-ubuntu20.04if your worker nodes are running Ubuntu 20.04. If the operator is not appending this suffix, please provide logs from the gpu-operator pod.
3q,I will try this!
@cdesiniotis I am currently trying to deploy the GPU operator on a cluster running on CentOS CoreOS. Therefore, "scos4.18" is append to the driver image.
Unfortunatly it does not exists in the NVidia driver repository. Is there a way to use another image or a workaround for that ?
Is there a way to use another image or a workaround for that ?
To work around this, you can pre-install the driver on your host and then the gpu-operator will skip the driver container deployment. Also, make sure you use the toolkit image with the ubi8 suffix
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed. To skip these checks, apply the "lifecycle/frozen" label.
This issue has been addressed in the comments above. Closing this for now.
If you still see issues with the latest version of gpu-operator, please feel free to open new issues with relevant logs and details attached.