gpu-operator
gpu-operator copied to clipboard
NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
Hello, we installed GPU Operator v24.9.0 provided through Red Hat marketplace OLM catalog in our disconnected 4.15.28 Openshift cluster nvidia-driver-ctr is reporting complete installation and a quick vector-add application terminated...
**I changed the config from all-3g to all-7g** $ kubectl label node rtx1 nvidia.com/mig.config=all-7g.40gb --overwrite node/rtx1 labeled **when I check this command, it shows successfully changed the mig config** $...
Hi Team, In our company, having 4.16 OpenShift cluster. We are using certified operator catalog (Nvidia GPU Operator - 25.3.1). When try to configure custom policy(gpu-cluster-policy) for dcgmExporter for custom...
1. Quick Debug Checklist Are you running on an Ubuntu 18.04 node? RHEL 9.6 Are you running Kubernetes v1.13+? v.1.31.9 (k8s) Are you running Docker (>= 18.06) or CRIO (>=...
### 1. Quick Debug Information * OS/Version: master node and other worker node Rocky 8.8 , Gpu worker node Rhel 8.8 * Kernel Version: 4.18.0-477.15.1.el8_8.x86_64 * Container Runtime Type/Version(e.g. Containerd,...
Hello Team, Recently, while upgrading the OpenShift cluster version from 4.17.18 to 4.17.20 version (Kubernetes version - v1.30.10), the NVIDIA GPU Operator was upgraded to version V25.3 After the upgrade,...
**Gpu operator version:** v24.6.1 **driver.version:** 535.154.05 **device plugin verion:** v0.16.2-ubi8 **Kubernetes distribution** EKS **Kubernetes version** v1.27.0 Hi, We attempted to install the Nvidia driver directly on our node's base image...
Hi, I have the following situation where the toolkit breaks the MicroK8s containerd config file. Ubuntu 22.04 LTS Nvidia Driver version: 550 MicroK8s version: 1.31 docker version: 24.0.4 gpu-operator helm...
My GPU is NVIDIA Corporation TU104GL [Quadro RTX 4000], the GPU have 3 aux dev When I set up the GPU for the use of kubevirt vm pass through, the...
On my OS (NixOS) /usr/bin is a symlink to /run/current-system/sw/bin ``` [root@nvidia-operator-validator-9nk4h /]# ls -lah /host/usr/bin lrwxrwxrwx 1 root root 26 Feb 16 13:09 /host/usr/bin -> /run/current-system/sw/bin ``` This means...