gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes

Results 392 gpu-operator issues
Sort by recently updated
recently updated
newest added

Allocatable gpu values not correct after configuring time slicing ``` apiVersion: v1 kind: ConfigMap metadata: name: time-slicing-config data: any: |- version: v1 flags: migStrategy: none sharing: timeSlicing: renameByDefault: false failRequestsGreaterThanOne:...

### 1. Quick Debug Information * OS/Version(e.g. RHEL8.6, Ubuntu22.04):Ubuntu20.04 * Kernel Version:5.15.0-69 * Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker):crio * K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS):K8s * GPU Operator...

DCGM Exporter is reporting an incorrect GPU profile. We configured an A100 80GB with the MiG profile `all-7g.80gb` which created one MiG profile which takes up to whole card. The...

needs-triage

I have a simple k8s cluster using v1.23 and I attempt to install the gpu operator on it; specifying to use the containerd args: ``` eksctl create cluster \ --name...

### 2. Issue or feature description I am currently working with a Kubernetes cluster where some nodes are equipped with multiple types of NVIDIA GPUs. For example, Node A has...

Hello, please let me know if I have not provided enough information. We want to locate appropriate driver images for Amazon Linux 2 operating systems and use GPU-Operator since we'd...

_The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense._...

Would it be possible to add dcgmExporter.annotations to the helm chart? We are using Datadog to monitor our clusters and seems like the autodiscovery agent (v7.51.0) has a problem with...

1. Quick Debug Information * OS/Version: Red Hat Enterprise Linux 8.9 * Kernel Version: 4 * Redhat Openshift version : 4 * Nvidia-gpu Operator Version: 23.9.2 * Application Details: *...

### 1. Quick Debug Information * OS/Version(e.g. RHEL8.6, Ubuntu22.04): RHEL 8.8 * Kernel Version: .18.0-477.27.1.el8_8.x86_64 * Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): CRI-O * K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE,...