gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes

Results 392 gpu-operator issues
Sort by recently updated
recently updated
newest added

_The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense._...

### 1. Issue or feature description On a DGX A100-80GB, trying to install the operator with mixed strategy MIG, feature discovery/node labeling work fine with MIG disabled, but as soon...

Hi, I wonder if it's possible to use the gpu-operator in a single-node Microk8s cluster hosted on a wsl2 Ubuntu distribution. Thanks.

It seems that it may happen, that /usr/local/nvidia/toolkit/nvidia-container-runtime fails it it runs from a directory that already does not exist. I can see the following in the kubelet.log ``` E0201...

k8s - 1.18.10, self-hosted, **w/o Internet access** workers - Ubuntu 18.04.4 #-------------------------------------- `nvidia-driver-daemonset` pod fails during packages update (because of private cluster): ``` Checking NVIDIA driver packages... Updating the package...

I have a 6 node Kubernetes cluster with a GPU operator 1.9 installed. I have 2 GPU servers ec2 type - p2, p3 on AWS. I have installed centos 7...

Hello, I'm facing some issues trying to make a GPU available in a kubernetes cluster. Based on my investigations, deployment process stops at nvidia-driver-daemonsets being blocked because the driver-validator which...

### 1. Quick Debug Checklist - [ ] Are you running on an Ubuntu 18.04 node? [No -- CentOS Linux release 7.6.1810 (Core)] - [X] Are you running Kubernetes v1.13+?...

The following installation will fail with "Cannot find nvidia-smi in $PATH" ``` helm install -n gpu-operator gpu-operator nvidia/gpu-operator --version=v1.7.1 --set driver.version=460.32.03 --set toolkit.version=1.5.0-ubuntu18.04 --set operator.defaultRuntime=containerd --set toolkit.env[0].name=CONTAINERD_CONFIG --set toolkit.env[0].value=/etc/containerd/config.toml --set...

I successfully have the GPU operator running in a k8s cluster on centos 7. Being that centos 7 will EOL in about 2 years and centos 8 is EOL, what...