gpu-operator
gpu-operator copied to clipboard
NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
### 1. Quick Debug Information * OS/Version(e.g. RHEL8.6, Ubuntu22.04): Ubuntu 22.04 * Kernel Version: 5.4.0-177-generic * Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): containerd * K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE,...
### 1. Quick Debug Information * OS/Version(e.g. RHEL8.6, Ubuntu22.04): Ubuntu 20.04 * Kernel Version: Kubernetes 1.24.14 * Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): containerd * K8s Flavor/Version(e.g. K8s, OCP, Rancher,...
Currently, the path to the kubelet socket for /pod-resources is hardcoded for dcgm-exporters to `/var/lib/kubelet/pod-resources` [here](https://github.com/NVIDIA/gpu-operator/blob/adceb5ac46c8125ccde13570541db5f1c9c8a302/controllers/object_controls.go#L1536). We have a usecase where Kubelet root dir is `/abc` and the pod-resources socket...
this PR to support L40S vgpu profile. latest version didnt support this https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#vgpu-types-nvidia-l40s
_The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense._...
_The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense._...
### 1. Issue or feature description I have created a multi-node k0s Kubernetes cluster using this blog https://www.padok.fr/en/blog/k0s-kubernetes-gpu I'm getting the same error `Failed to create pod sandbox: rpc error:...