gpu-operator Problems running the GPU operator on k3s

Our product is IBM Edge Application Manager (IEAM). It manages containerized workloads on small devices, like the NVIDIA nano, and TX2. It also manages containerized workloads on Kubernetes clusters. We support OCP kubernetes clusters and k3s kubernetes clusters (among others) for IEAM. We would like to be able to support NVIDIA GPUs on all of our supported platforms. However, we are currently trying to run the NVIDIA GPU operator on k3s and we are running into problems.

Hardware: 96 core Xeon, 200GB RAM OS/Distro: Linux 5.3, ubuntu 18.04.04 Docker 19.03.12 Fresh install of the latest k3s

We followed the instructions (https://github.com/NVIDIA/gpu-operator#install-helm) on the installation but 3 of the pods do not come up. Note, as per the preqs, that we invoked the helm install as follows as I saw that nfd pods existed: sudo helm install --devel --set nfd.enabled=false nvidia/gpu-operator --wait --generate-name

For the pod logs of the two that have the CrashLoopBackOff error show:

Attached are the pod descriptions for the three pods.

This seems to be a similar issue to https://access.redhat.com/solutions/5089121.

Here is our platform configuration info:

(Please see the 2 attached files). pods.txt log-nvidia-dcgm-exporter-48rqg.txt

We received some help on this from Anurag Guda and Anudeep Nallamothu but we remain blocked and they suggested I should raise a github issue for this.

Jul 22 '20 21:07 TheMosquito

We're excited to see you trying GPU operator. Sorry but we do not support k3s yet, but we do have it in our future plans.

Looking at the error messages we can see problems with the driver setup or the runtime. We will provide you with updates as soon as we debug these problems and support k3s.

Jul 27 '20 18:07 nvjmayo

@nvjmayo any update for this, it has been quite awhile :)

Dec 30 '20 18:12 ElisaMeng

+1

Jan 25 '21 00:01 corbanvilla

@ElisaMeng @corbanvilla Can you try with v1.7.0 and verify if you are still seeing this issue?

May 25 '21 16:05 shivamerla