gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes

Results 392 gpu-operator issues
Sort by recently updated
recently updated
newest added

Goal: Have a docker container within a k8s cluster run a pytorch script using Nvidia GPU on local at home computer. ### 1. Quick Debug Information * OS/Version(e.g. RHEL8.6, Ubuntu22.04):...

### 1. Quick Debug Information * OS/Version(e.g. RHEL8.6, Ubuntu22.04): RHEL8.6 * Kernel Version: 4.18.0-372.9.1.el8.x86_64 * Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): cri-o://1.26.4 * K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS):...

I have used nvidia-dcgm-exporter to expose the nvidia metrics via port 9400. I managed to find some metrics like DCGM_FI_DEV_GPU_TEMP, etc but I am not able to find metrics for...

### 1. Quick Debug Information * OS/Version(Ubuntu22.04): * Kernel Version: 5.15.0-78-generic * Container Runtime Type/Version(Containerd 1.71.0): * K8s Flavor/Version(K8s 1.26.0): * GPU Operator Version: 23.5 ### 2. Issue or feature...

What is the proper way to limit GPUs visibility to a Kubernetes node? The use case is a shared environment where we would like to run both Kubernetes and Docker,...

[Uploading NVlinkError-fabricmanager-en1.docx…]() I have 1 NVLink device to connect 2 nvidia A40 graphics cards, used ubuntu20.04 system, downloaded and installed nvidia-driver-local-repo-ubuntu2004-515.105.01_1.0-1_ amd64 .deb driver from the official website, and then...

Will gpu-operator support Rocky linux in the furture?

### 1. Quick Debug Information * OS/Version(e.g. RHEL8.6, Ubuntu22.04): RHCOS4.13 * Kernel Version: 4.18.0-372.59.1.el8_6.x86_64 * Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): CRI-O * K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS):...

_The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense._...

### 1. Quick Debug Information * OS/Version: DGX OS 5.5 * Container Runtime Type/Version: Containerd * K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS): RKE2 v1.24.13+rke2r1 * GPU Operator Version: gpu-operator-v23.3.2...