gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes

Results 392 gpu-operator issues
Sort by recently updated
recently updated
newest added

### 1. Quick Debug Information * OS/Version(e.g. RHEL8.6, Ubuntu22.04): Ubuntu 22.04 * Kernel Version: 5.4.0-177-generic * Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): containerd * K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE,...

### 1. Quick Debug Information * OS/Version(e.g. RHEL8.6, Ubuntu22.04): Ubuntu 20.04 * Kernel Version: Kubernetes 1.24.14 * Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): containerd * K8s Flavor/Version(e.g. K8s, OCP, Rancher,...

Currently, the path to the kubelet socket for /pod-resources is hardcoded for dcgm-exporters to `/var/lib/kubelet/pod-resources` [here](https://github.com/NVIDIA/gpu-operator/blob/adceb5ac46c8125ccde13570541db5f1c9c8a302/controllers/object_controls.go#L1536). We have a usecase where Kubelet root dir is `/abc` and the pod-resources socket...

feature

this PR to support L40S vgpu profile. latest version didnt support this https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#vgpu-types-nvidia-l40s

_The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense._...

_The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense._...

### 1. Issue or feature description I have created a multi-node k0s Kubernetes cluster using this blog https://www.padok.fr/en/blog/k0s-kubernetes-gpu I'm getting the same error `Failed to create pod sandbox: rpc error:...