Shiva Krishna Merla

Results 278 comments of Shiva Krishna Merla

@choyuansu We would need to fix templates to be used with Kustomize. This has been a low priority as we don't use them internally and test with helm or OLM...

@tatodorov can you pass env `RUNTIME_RESTART_MODE` to `none` under `toolkit.env` in the `ClusterPolicy` and verify if this issue persists? The toolkit will reload containerd on applying `nvidia` specific runtime config,...

> The cluster auto-scheduler will therefore see that the node is ready but the workload pod is still unschedulable, thus triggering an additional scale-up. This process will repeat until the...

changes look good, @csauoss can you rebase please.

@slik13 this is the current limitation, and we have a feature in the roadmap to avoid this. Currently, we use bind mount to mount necessary installation files (/usr/bin, /lib/modules, /lib)...

@mayooot one of the primary reason was that `ClusterPolicy` CRD is already overloaded with many operands and close to 256KB limit. We cannot upgrade operator anymore after this limit. We...

@tariq1890 please audit memory requests/limits again to verify they work with scale cluster. If any requests/limits are reduced those are the ones we need to audit first.

@Mohamed-ben-khemis Can you run "kubectl get pods -n gpu-operator" to confirm that the driver is run from the operator? We don't install openGL libraries today from the driver-container. @elezar do...

@nikp1172 can you submit the PR here instead: https://gitlab.com/nvidia/kubernetes/gpu-operator