Shiva Krishna Merla

Results 278 comments of Shiva Krishna Merla

@fame346 will check with Canonical on the mismatch of these packages as they should be aligned with the driver version. Meanwhile you can install fabric manager from NVIDIA CUDA repos...

@arpitsharma-vw can you check `dmesg` on the node and report any driver errors. `dmesg | grep -i nvrm`. If you see GSP RM related errors please try [this](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/custom-driver-params.html) workaround to...

@quanguachong we do no support this configuration currently. You can make this work by installing container-toolkit packages manually on the node and disabled toolkit container with the gpu-operator. This scenario...

Can you run `kubectl get pods -n gpu-operator` to show which pods are running. If you deployed with driver enabled, it takes 3-5 minutes for the drivers to be installed...

you can disable toolkit as well by editing `kubectl edit clusterpolicy` and setting `toolkit.enabled=false`. Looks like you have nvidia-container-runtime already configured on the host and containerd config updated manually?

Can you also paste logs of `nvidia-container-toolkit-daemonset-9rvz8` pod, curious as to why it is restarting. Which containerd and OS version is this?

Thanks @denissabramovs will check these out and try to repro with 1.6.9 containerd version.

thanks @xhejtman for linking the relevant issue.

Thanks for the inputs @anoopsinghnegi we will look into avoid `containerd` restarts and `driver` unload whenever not necessary. Since the driver container bind mounts the container path `/run/nvidia/driver` onto the...