gpu-operator
gpu-operator copied to clipboard
console-plugin-nvidia-gpu / GPU Operator Dashboard not showing
1. Quick Debug Checklist
- Are you running on an Ubuntu 18.04 node? No, I am running Red Hat Enterprise Linux CoreOS 410.84.202209231843-0
- Are you running Kubernetes v1.13+? Yes, I am running OpenShift 4.10.35 with Kubernetes 1.23
- Are you running Docker (>= 18.06) or CRIO (>= 1.13+)? Yes, CRI-O 1.23.3-17.rhaos4.10.git016b1ca.el8
- GPU Operator version: 22.9.0
- Helm version: v3.6.3
1. Issue or feature description
I did the GPU Operator Dashboard Setup and everything worked without problems. I even received a message, that my web interface had changed and that I should reload. However, there are no GPU metrics under Home > Overview.
Everything is running, and nothing suspicious is in the logs. The GPU Operator works without a problem, it is just about the NVIDIA GPU Operator usage information dashboard.
2. Steps to reproduce the issue
- Setup OpenShift with the versions from above
- Install the Nvidia GPU Operator with default settings
- Follow the official GPU Operator Dashboard Setup guide
3. Information to attach
oc -n nvidia-gpu-operator get all -l app.kubernetes.io/name=console-plugin-nvidia-gpu
NAME READY STATUS RESTARTS AGE
pod/console-plugin-nvidia-gpu-5f66897879-lzx9q 1/1 Running 0 23h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/console-plugin-nvidia-gpu ClusterIP 10.125.190.2 <none> 9443/TCP 23h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/console-plugin-nvidia-gpu 1/1 1 1 23h
NAME DESIRED CURRENT READY AGE
replicaset.apps/console-plugin-nvidia-gpu-5f66897879 1 1 1 23h
oc get consoles.operator.openshift.io cluster --output=jsonpath="{.spec.plugins}"
["mce","acm","console-plugin-nvidia-gpu"]