gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

Use gpu-operator v23.9.2 version, when pulling nvcr.io/nvidia/driver:550.54.14 and nvcr.io/nvidia/cloud-native/nvidia-fs:2.17.5, the error "manifest unknown" always seems to be reported "

Open sunwuyan opened this issue 1 year ago • 2 comments

Hi, I want to install gpu-operator(v23.9.2 version) in my kubernetes cluster, but when pulling nvcr.io/nvidia/driver:550.54.14 and nvcr.io/nvidia/cloud-native/nvidia-fs:2.17.5, the error "manifest unknown" always seems to be reported ", What did I do wrong?

Here is my environment:

  • OS/Version: Ubuntu20.04
  • Container Runtime Type/Version: Containerd
  • K8s Flavor/Version: k8s
  • GPU Operator Version: v23.9.2

sunwuyan avatar Mar 29 '24 09:03 sunwuyan

@sunwuyan the gpu-operator will append an OS suffix to the image tag. For example, the driver image set in the DaemonSet should be nvcr.io/nvidia/driver:550.54.14-ubuntu20.04 if your worker nodes are running Ubuntu 20.04. If the operator is not appending this suffix, please provide logs from the gpu-operator pod.

cdesiniotis avatar Apr 04 '24 18:04 cdesiniotis

@sunwuyan the gpu-operator will append an OS suffix to the image tag. For example, the driver image set in the DaemonSet should be nvcr.io/nvidia/driver:550.54.14-ubuntu20.04 if your worker nodes are running Ubuntu 20.04. If the operator is not appending this suffix, please provide logs from the gpu-operator pod.

3q,I will try this!

sunwuyan avatar Apr 07 '24 13:04 sunwuyan

@cdesiniotis I am currently trying to deploy the GPU operator on a cluster running on CentOS CoreOS. Therefore, "scos4.18" is append to the driver image.

Unfortunatly it does not exists in the NVidia driver repository. Is there a way to use another image or a workaround for that ?

Jefidev avatar Mar 20 '25 12:03 Jefidev

Is there a way to use another image or a workaround for that ?

To work around this, you can pre-install the driver on your host and then the gpu-operator will skip the driver container deployment. Also, make sure you use the toolkit image with the ubi8 suffix

tariq1890 avatar Mar 21 '25 18:03 tariq1890

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed. To skip these checks, apply the "lifecycle/frozen" label.

github-actions[bot] avatar Nov 05 '25 00:11 github-actions[bot]

This issue has been addressed in the comments above. Closing this for now.

If you still see issues with the latest version of gpu-operator, please feel free to open new issues with relevant logs and details attached.

rahulait avatar Nov 14 '25 04:11 rahulait