dcgm-exporter icon indicating copy to clipboard operation
dcgm-exporter copied to clipboard

Exporter does not provide any of the DCGM_FI_DEV_*_UTIL metrics

Open kt-pham opened this issue 1 year ago • 4 comments

Is there something that I need to enable or turn on in order to be able to get the following metrics?

DCGM_FI_DEV_GPU_UTIL,      gauge, GPU utilization (in %).
DCGM_FI_DEV_MEM_COPY_UTIL, gauge, Memory utilization (in %).
DCGM_FI_DEV_ENC_UTIL,      gauge, Encoder utilization (in %).
DCGM_FI_DEV_DEC_UTIL ,     gauge, Decoder utilization (in %).

I am unable to see these specific metrics from my exporter. @glowkey could you help to answer this question?

kt-pham avatar Dec 04 '24 23:12 kt-pham

I'm having the same issue. Just some more details below:

  • I see those metrics on the exporter-metrics-config-map ConfigMap, but still they don't appear when calling the /metrics path.
  • Helm Chart version: 3.6.1
  • Container image: image: nvcr.io/nvidia/k8s/dcgm-exporter:3.3.9-3.6.1-ubuntu22.04

danielserrao avatar Dec 12 '24 14:12 danielserrao

These metrics should be supported by all GPU models but are not supported for MIG configurations. Do you see the metrics with 'dcgmi dmon'?

glowkey avatar Dec 12 '24 18:12 glowkey

is there any way for us to see usage metrics when MIG is enabled?

kt-pham avatar Dec 12 '24 18:12 kt-pham

Use DCGM_FI_PROF_GR_ENGINE_ACTIVE and DCGM_FI_PROF_DRAM_ACTIVE, which is reported for MIG devices. I'd encourage you to look through the DCGM_FI_PROF* family of metrics, otherwise known as DCP metrics.

glowkey avatar Dec 12 '24 18:12 glowkey