gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes

Results 392 gpu-operator issues
Sort by recently updated
recently updated
newest added

## Environment ● Kubernetes: 1.20.11 ● OS: Centos7(3.10.0-1160.15.2.el7.x86_64) ● Docker: 19.03.15 ● NVIDIA Driver Version: 510.47.03 ● DCGM Exporter Docker Image: nvcr.io/nvidia/k8s/dcgm-exporter:2.3.5-2.6.5-ubuntu20.04 ## Issue description There is no process is...

### 1. Issue or feature description This is mostly a question; why is https://github.com/NVIDIA/gpu-operator/blob/7441195aba0145dbbe8f8e4d43716e6c8e6186c2/assets/state-driver/0500_daemonset.yaml#L51-L53 set to `mountPropagation: Bidirectional`? If I remove this from the manifest, the pod sits in `init`...

I'm running into a strange error when trying to set up a 7g.80gb MIG config on my nodes. The nodes have 8x NVIDIA-A100-SXM4-80GB. I've installed the GPU operator from the...

### 1. Issue or feature description I have a hand full of GPU nodes that have to be on Ubuntu 18.04. I also have a bunch more GPU nodes that...

Our monitoring system (datadog) requires us to set pod annotations to the exporter pods. Would be great if you could add a way to set `spec.template.metadata.annotations` of the daemonset. Thanks

I currently have been able to deploy a development release of Red Hat OpenShift 4.9 running on RHCOS in a single node scenario on my Nvidia Jetson AGX Xavier: $...

- Issue when configuring local repo configMap for air-gapped env using 1.9.0 operator chart. Default CentOS and cuda repos are still being used/configured in /etc/yum.repos.d/ in nvidia-driver-daemonset pod ``` ......

Can be `serial number` of a GPU exported? I.e. in a metric? In addition, please advise the recommended way to read details about a GPU. So far we can query...

Is it possible to install multiple GPU models in a single host? If so, how are details about these models exported? I can see Node's labels to be used but...

### 1. Issue description Redhat Openshift (ROKS on IBM Cloud) gets the alert `GPUOperatorOpenshiftDriverToolkitEnabledNfdTooOld` with the Description `The DriverToolkit is enabled in the GPU Operator ClusterPolicy, but the NFD version...