nos
nos copied to clipboard
Metrics-exporter setup; How to go about it?
Came across the metrics exporter, however am not able to set it up, The errors are:
{"level":"info","ts":1679291005.7844253,"msg":"reading metrics file","metricsFile":""}
{"level":"error","ts":1679291005.7844558,"msg":"failed to read metrics file","error":"open : no such file or directory","stacktrace":"main.main\n\t/workspace/cmd/metricsexporter/metricsexporter.go:62\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
Can someone please point me to set this up? We need to set up per pod GPU utilization metrics
Hi @suchisur, thanks for your interest in nos! The metrics exporter in nos
does not provide GPU utilization metrics and is only used to optionally share basic telemetry data during nos
installation as described in this documentation page.
For collecting GPU utilization metrics, I'd suggest using Prometheus with the NVIDIA DGCM Exporter. If you are already using the NVIDIA GPU Operator, you can easily set up the DCGM exporter as described here. Hope this helps!