dashboard icon indicating copy to clipboard operation
dashboard copied to clipboard

Why not show GPU resource on the dashboard

Open wjdfx opened this issue 8 years ago • 14 comments

I want know why not GPU resource information on the dashboard? When I use kubectl describe nodes CLI get the GPU detailed information,But I didn't see any GPU information on the dashboard.This is the plan?

Environment
Dashboard version:
Kubernetes version:
Operating system:
Node.js version:
Go version:
Steps to reproduce
Observed result
Expected result
Comments

wjdfx avatar May 06 '17 17:05 wjdfx

This is the plan?

Yes it is.

maciaszczykm avatar May 09 '17 07:05 maciaszczykm

@wjdfx I assume that GPU resource is some information on node details. I have never tried this setup, so I don't know what it looks like.

@maciaszczykm

do you mean, yet it is planned to add this information sometime in the future? or yes it is the plan not to show this information? If so, why?

cheld avatar Jun 22 '17 08:06 cheld

Some relevant docs: https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/

The key (under resources and limits) is alpha.kubernetes.io/nvidia-gpu (also alpha.kubernetes.io/nvidia-gpu-name can be specified with the --node-labels='alpha.kubernetes.io/nvidia-gpu-name=xxx kubelet option).

This is an alpha feature at the moment. It might make sense wait until it enters beta at least?

lenartj avatar Jun 25 '17 10:06 lenartj

This is an alpha feature at the moment. It might make sense wait until it enters beta at least?

Yes, we should wait for at least beta.

maciaszczykm avatar Jun 26 '17 07:06 maciaszczykm

Any follow-up in showing GPU stats in dashboard?

fanyangCS avatar Sep 21 '17 11:09 fanyangCS

We were focused on more important topics lately, like security and logging in mechanism. This feature is rather low on our priority list for now. No ETA.

floreks avatar Sep 21 '17 12:09 floreks

Any update?

bhack avatar Dec 12 '17 11:12 bhack

Any update on showing GPU stats on Kubernetes Dashboard?

xinxingliu90 avatar Nov 29 '18 22:11 xinxingliu90

has any update to support show gpu info in dashboard?

sunxianchao avatar Sep 27 '19 11:09 sunxianchao

any update?

therealnlee avatar Jul 02 '20 05:07 therealnlee

This has low priority for us at the moment. If you are willing to contribute then let us know.

maciaszczykm avatar Jul 02 '20 06:07 maciaszczykm

It would be great to have at least information that pod has limits/requests set on any device that is compatible with device plugin framework (so not only gpus) and how much of that resource is requested. For example if node has 4 tpu's and there are 3 pods each consuming one 1 tpu it should be visible somewhere ideally right next to cpu/memory. It would really help debugging scheduling issues if nothing else. At this point in time devices exposed by device plugin framework are treated like third class citizens. CPU/Memory is not enough.

boniek83 avatar Nov 24 '21 17:11 boniek83

I am not a contributor, but I am looking into this issue. I found this on Nvidia's website https://docs.nvidia.com/datacenter/cloud-native/gpu-telemetry/dcgm-exporter.html#gpu-telemetry

Which is configured to export to this grafana dashboard https://grafana.com/grafana/dashboards/12239-nvidia-dcgm-exporter-dashboard/

If we can replicate this effort, we could then setup the metrics-scraper to consume metrics with the same pattern that Nvidia uses to build that grafana dashboard. We would want to provide information on the cluster level, with node and namespace level metrics

@maciaszczykm is there anyone from the contributors working on this that we could help?

Talador12 avatar Jun 06 '23 20:06 Talador12