Shiva Krishna Merla comments

Results 278 comments of


                                            Shiva Krishna Merla

NVIDIA GPU Operator 24.23.0 Failed on OCP 4.14.23 Cluster

thanks @habjouqa will take a look

How do I install using Kustomize?

@choyuansu We would need to fix templates to be used with Kustomize. This has been a low priority as we don't use them internally and test with helm or OLM...

containerd restarts at least once an hour

@tatodorov can you pass env `RUNTIME_RESTART_MODE` to `none` under `toolkit.env` in the `ClusterPolicy` and verify if this issue persists? The toolkit will reload containerd on applying `nvidia` specific runtime config,...

Issue with autoscaler scheduling

> The cluster auto-scheduler will therefore see that the node is ready but the workload pod is still unschedulable, thus triggering an additional scale-up. This process will repeat until the...

changes to allow custom labels for ServiceMonitor

changes look good, @csauoss can you rebase please.

Driver daemonset uninstall the driver on node reboot even if no new version is available

@slik13 this is the current limitation, and we have a feature in the roadmap to avoid this. Currently, we use bind mount to mount necessary installation files (/usr/bin, /lib/modules, /lib)...

Add Env valueFrom function

@mayooot one of the primary reason was that `ClusterPolicy` CRD is already overloaded with many operands and close to 256KB limit. We cannot upgrade operator anymore after this limit. We...

Bump NFD to v0.16.1

@tariq1890 please audit memory requests/limits again to verify they work with scale cluster. If any requests/limits are reduced those are the ones we need to audit first.

VirtualGL with NVIDIA GPU Operator in EKS (Invalid EGL device)

@Mohamed-ben-khemis Can you run "kubectl get pods -n gpu-operator" to confirm that the driver is run from the operator? We don't install openGL libraries today from the driver-container. @elezar do...

Allow devicePlugin.config.default to be none.

@nikp1172 can you submit the PR here instead: https://gitlab.com/nvidia/kubernetes/gpu-operator