server
server copied to clipboard
Enhancement Request: Additional GPU Information in Prometheus Metrics
Is your feature request related to a problem? Please describe. no
Currently, the triton-server provides GPU utilization metrics in Prometheus format, like so:
# HELP nv_gpu_utilization GPU utilization rate [0.0 - 1.0)
# TYPE nv_gpu_utilization gauge
nv_gpu_utilization{gpu_uuid="GPU-3fed825f-252b-32ea-e3d7-266c45b62ce7"} 0
I would like to request the inclusion of additional information, specifically the GPU number and GPU name, similar to what can be obtained using nvidia-smi -L. This information would greatly aid in creating dynamic Grafana dashboards without the need to consult additional identification information on the physical host.
Example output of nvidia-smi -L:
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-c8a1aa60-c24c-5ce2-fc43-068d14542d00)
GPU 1: NVIDIA GeForce RTX 3090 (UUID: GPU-04727ce0-d35e-c535-9a43-b989af8d016f)
Including the GPU number and GPU name in the Prometheus metrics would improve the user experience and ease the dynamic creation of monitoring dashboards.
Thank you for considering this enhancement request.
Best regards, Levi Pereira
I'm going to take a crack at this.
@rmccorm4, what are your thoughts on this feature request? Let me know if you would like me to open a ticket.
@ClifHouck, did you have success with this enhancement? Thanks for working on this!
@dyastremsky Yes, but I ran into this bug: https://github.com/triton-inference-server/server/issues/6815
I've opened a PR to address it: https://github.com/triton-inference-server/core/pull/321
I was waiting for that to be resolved before opening another PR to address this issue.
Thanks for letting me know, Clif. I'll take a look.