nos icon indicating copy to clipboard operation
nos copied to clipboard

Metrics-exporter setup; How to go about it?

Open suchisur opened this issue 1 year ago • 1 comments

Came across the metrics exporter, however am not able to set it up, The errors are:

{"level":"info","ts":1679291005.7844253,"msg":"reading metrics file","metricsFile":""}
{"level":"error","ts":1679291005.7844558,"msg":"failed to read metrics file","error":"open : no such file or directory","stacktrace":"main.main\n\t/workspace/cmd/metricsexporter/metricsexporter.go:62\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}

Can someone please point me to set this up? We need to set up per pod GPU utilization metrics

suchisur avatar Mar 20 '23 05:03 suchisur

Hi @suchisur, thanks for your interest in nos! The metrics exporter in nos does not provide GPU utilization metrics and is only used to optionally share basic telemetry data during nos installation as described in this documentation page.

For collecting GPU utilization metrics, I'd suggest using Prometheus with the NVIDIA DGCM Exporter. If you are already using the NVIDIA GPU Operator, you can easily set up the DCGM exporter as described here. Hope this helps!

Telemaco019 avatar Mar 20 '23 07:03 Telemaco019