Shiva Krishna Merla comments

Results 278 comments of


                                            Shiva Krishna Merla

[feature request] Add a way to set pod annotations for dcgm exporter

we will look into adding this with future releases.

WSL2 Support

@elezar to comment if this is supported by our container-toolkit.

NVIDIA GPU operator versions after 1.9.1 for RHEL7 are not working

@KodieGlosserIBM Can you overwrite container-toolkit version in ClusterPolicy with v1.6.0 (which seems to have worked with v1.9.1 installs) and confirm if this is resolved? ``` toolkit: repository: nvcr.io/nvidia/k8s version: 1.6.0-ubi8...

NVIDIA GPU operator versions after 1.9.1 for RHEL7 are not working

@KodieGlosserIBM For RHEL7 nodes this workaround is required when ClusterPolicy instance is created. For RHEL8 i believe this workaround is not required.

NVIDIA GPU operator versions after 1.9.1 for RHEL7 are not working

@KodieGlosserIBM since we officially don't claim support for RHEL7 the documentation doesn't exist for this. I will check with our PMs on how to handle this case. By the way...

NOTICE: NVIDIA Driver Pods are failing due to CUDA linux repository GPG key rotation

@gchazot It can be done through `helm upgrade` as well with same version of the chart but by changing driver imagePullPolicy. Either approaches will result in same.

NOTICE: NVIDIA Driver Pods are failing due to CUDA linux repository GPG key rotation

@fanminshi @jinwonkim93 weird that `imagePullPolicy=Always` didn't update the image. Can you double by describing the driver pod that new image is pulled?

Cannot find nvidia-smi in $PATH in toolkit-validation

@mastier toolkit validation doesn't use "chroot", but directly invokes `nvidia-smi` as we expect toolkit to inject these files automatically. Hence mount of `/run/nvidia/driver` is not required for this container. Code...

Cannot find nvidia-smi in $PATH in toolkit-validation

If you add the `debug` fields i mentioned earlier with toolkit config file, we should see those logs under "/var/log". Also, can you attach `/etc/containerd/config.toml` as well.

Cannot find nvidia-smi in $PATH in toolkit-validation

@aym-frikha after applying the change, can you delete gpu-operator-validator pod, so that when it tries to run again it will generate logs by toolkit. Also, please don't delete the toolkit...