gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

console-plugin-nvidia-gpu / GPU Operator Dashboard not showing

Open Alwinator opened this issue 3 years ago • 0 comments

1. Quick Debug Checklist

  • Are you running on an Ubuntu 18.04 node? No, I am running Red Hat Enterprise Linux CoreOS 410.84.202209231843-0
  • Are you running Kubernetes v1.13+? Yes, I am running OpenShift 4.10.35 with Kubernetes 1.23
  • Are you running Docker (>= 18.06) or CRIO (>= 1.13+)? Yes, CRI-O 1.23.3-17.rhaos4.10.git016b1ca.el8
  • GPU Operator version: 22.9.0
  • Helm version: v3.6.3

1. Issue or feature description

I did the GPU Operator Dashboard Setup and everything worked without problems. I even received a message, that my web interface had changed and that I should reload. However, there are no GPU metrics under Home > Overview.

Everything is running, and nothing suspicious is in the logs. The GPU Operator works without a problem, it is just about the NVIDIA GPU Operator usage information dashboard.

2. Steps to reproduce the issue

  1. Setup OpenShift with the versions from above
  2. Install the Nvidia GPU Operator with default settings
  3. Follow the official GPU Operator Dashboard Setup guide

3. Information to attach

oc -n nvidia-gpu-operator get all -l app.kubernetes.io/name=console-plugin-nvidia-gpu

NAME                                             READY   STATUS    RESTARTS   AGE
pod/console-plugin-nvidia-gpu-5f66897879-lzx9q   1/1     Running   0          23h

NAME                                TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/console-plugin-nvidia-gpu   ClusterIP   10.125.190.2   <none>        9443/TCP   23h

NAME                                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/console-plugin-nvidia-gpu   1/1     1            1           23h

NAME                                                   DESIRED   CURRENT   READY   AGE
replicaset.apps/console-plugin-nvidia-gpu-5f66897879   1         1         1       23h

oc get consoles.operator.openshift.io cluster --output=jsonpath="{.spec.plugins}"

["mce","acm","console-plugin-nvidia-gpu"]

Alwinator avatar Oct 18 '22 06:10 Alwinator