gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes

Results 392 gpu-operator issues
Sort by recently updated
recently updated
newest added

# System Running on bare-metal - Ubuntu 20.04.4 - Kubernetes v1.24.3 - Containerd 1.6.7 - GPU-Operator v1.11.1 # Setup GPU-Operator is installed with: ``` helm install --wait --debug --generate-name --create-namespace...

bug

_The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense._...

I am running a cluster with a number of nvidia gpu. I'm also monitoring gpu using dcgm-exporter. However, sometimes the dcgm-exporter fails to give metrics with the logs below. ```...

### 1. Issue or feature description I would love to have ability to specify my own set of labels on pods. My organization uses them for pod ownership (resource cost...

enhancement

_The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense._...

### 1. Quick Debug Information * OS/Version: Ubuntu 22.04 * Kernel Version: ``` Linux version 5.19.0-1030-gcp (buildd@bos03-amd64-050) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.1.0-2ubuntu1~22.04) 12.1.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #32~22.04.1-Ubuntu SMP...

In driver image container, the latest driver version is 525.85.12-rhel8.6. After I did some reseach and found only 'nvcr.io/nvidia/driver:515.86.01-rhel8.6' can both support these two card, nvidia rtx A4000 and A2000....

### 1. Quick Debug Information * OS/Version: Ubuntu20.04: * Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): containerd * K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS): EKS 1.23 ### 2. Issue or...

Hi, I want to install gpu-operator(v23.9.2 version) in my kubernetes cluster, but when pulling nvcr.io/nvidia/driver:550.54.14 and nvcr.io/nvidia/cloud-native/nvidia-fs:2.17.5, the error "manifest unknown" always seems to be reported ", What did I...

### 1. Quick Debug Information * OS/Version - Ubuntu22.04 * Kernel Version: 5.15.0-1045-gke * Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): Containerd * K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS): GKE...