Shiva Krishna Merla

Results 278 comments of Shiva Krishna Merla

@kpouget we just need to restart the container so that toolkit tries to inject files again. So just pod delete should do it.

@asviel If you add `debug` fields as mention in comment https://github.com/NVIDIA/gpu-operator/issues/226#issuecomment-884535327 , you should see logs under /var/log on the host itself. Can you please share the install command or...

@happy2048 Can you try following and verify memory usage with each step to help narrow this down further. 1. Edit clusterpolicy with ``kubectl edit clusterpolicy`` and change ``dcgm.version`` to ``2.3.4-1-ubuntu20.04``....

@happy2048 we are trying to reproduce this internally. I have tried with 510 and 470 latest drivers with above mentioned DCGM version, but couldn't reproduce it. Will try to test...

I had run jupyter notebook with my tests, and now with the same workload you are running. I have changed the collection interval too. It went up a bit, but...

@happy2048 Can you try with UBI image on CentOS and verify it this happens? `nvcr.io/nvidia/k8s/dcgm-exporter:2.3.5-2.6.5-ubi8`. So far no luck with Ubuntu systems, so going to try with CentOS to match...

@glowkey @dualvtable Any additional information we can gather to reproduce this internally?

@dogra-gopal thanks for reporting this. Didn't realize NFD chart didn't use `crds/` folder. Will check with the NFD team on this.

@dogra-gopal regarding below statement, can you elaborate more? ``` This causes issue with helm-chart uninstallation, where helm tries to delete CRDs. It will lead to data loss during reinstall. ```...

@william0212 Can you share the output of `nvidia-smi` run from the driver pod or any of the plugin/GFD pods? Is the GPU A100 80GB? Also can you share server model...