k8s-device-plugin icon indicating copy to clipboard operation
k8s-device-plugin copied to clipboard

nvml error: driver/library version mismatch: unknown

Open jaipreetnagpal opened this issue 3 months ago • 2 comments

failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown

We can observe that the nodes during the initial boot gives the error with the nvidia/k8s-device-plugin. But if we reboot the node the mismatch error disappears and we are able to observe that the nodes are able to register the GPU. what can be the issue here

cli-version: 1.17.8 lib-version: 1.17.8 nvidia-smi output Failed to initialize NVML: Driver/library version mismatch NVML library version: 570.195 nvidia/k8s-device-plugin:v0.16.2

jaipreetnagpal avatar Sep 30 '25 16:09 jaipreetnagpal

@dims @aaronp24 ,Can someone look into this issue ??

jaipreetnagpal avatar Oct 03 '25 08:10 jaipreetnagpal

Could you describe how the GPU Device Plugin and the NVIDIA GPU Driver are installed?

elezar avatar Oct 09 '25 12:10 elezar