Shiva Krishna Merla comments

Results 278 comments of


                                            Shiva Krishna Merla

nvidia-cuda-validator pods crashlooping in OKD4.7

> helm install --wait --generate-name > ./gpu-operator \ > --set nfd.enabled=false \ (because I have deployed above) > --set operator.defaultRuntime=crio > --set driver.enabled=false (because I have install on the local...

nvidia-cuda-validator pods crashlooping in OKD4.7

@william0212 cuda-validator pod doesn't download cuda images, we have `vectorAdd` sample within `gpu-operator-validator` image which gets invoked at runtime. Wondering if cuda 11.4.1 package installed directly on host is causing...

Add NVIDIA/gpu-operator chart repo to https://artifacthub.io/

@dioguerra Please pull the operator charts from here: https://catalog.ngc.nvidia.com/orgs/nvidia/helm-charts/gpu-operator

Tesla T4 支持进程内存监控采集吗？

@kelonsen did you look into the dcgm-exporter metrics collected for memory utilization? https://github.com/NVIDIA/gpu-monitoring-tools/blob/master/etc/dcgm-exporter/dcp-metrics-included.csv Also, metrics are mapped to pod-level resources to track the usage per pod(pod-name, namespace, device-id). This [blog](https://developer.nvidia.com/blog/monitoring-gpus-in-kubernetes-with-dcgm/)...

Feature Request: Support multiple GPU driver versions in one k8s cluster

Thanks for the feature request. This will indeed be a great feature. Currently only way this can be done is with pre-installed drivers on the host with GPU operator. We...

Feature Request: Support multiple GPU driver versions in one k8s cluster

@khatrig, we currently package a single driver version into each image, hence the requirement to have separate daemonsets.

failed to pull and unpack image "nvcr.io/nvidia/driver:470.82.01-sles15.3"

@jear Will update you on this, we don't have official driver images for SLES15 yet and as @dualvtable mentioned its in the works.

failed to pull and unpack image "nvcr.io/nvidia/driver:470.82.01-sles15.3"

@jear We are working with SUSE on releasing official image for SLES(planned for GPU operator 1.11). Meanwhile from the link you have shared above, looks like we need changes to...

failed to pull and unpack image "nvcr.io/nvidia/driver:470.82.01-sles15.3"

@jear With the upcoming release we are planning to support RKE2 with Ubuntu and RHEL8. Toolkit config required for ubuntu would be as below. SLES support is still being reviewed...

Support Mixed OS Version

@AStrangwood yes, this is a known limitation with GPU operator today and we are looking to support mixed mode soon.