glowkey

Results 57 comments of glowkey

This is caused by a change in driver 510 that lumps the reserved memory into the used category. We are updating DCGM to handle this case and split the used...

Timeline is for a DCGM 2.4 based dcgm-exporter by the end of May 2022.

Which metrics are being watched? What is DCGM_EXPORTER_INTERVAL set to? In general what are all the changes from a default installation? This information could help us.

Tracking here: https://github.com/NVIDIA/dcgm-exporter/issues/111

Currently this isn't possible as the MIG partition UUID isn't available from DCGM.

Can you attach the logs from the pod?

If you are running dcgm-exporter inside a container, it's possible the switches and links are not mounted. See https://github.com/NVIDIA/dcgm-exporter/issues/169#issuecomment-1604771610 But also note you may run into this outstanding issue related...

Profiling metrics are the average over the update interval (the updateFrequency) parameter.

The `libnvidia-nscq` message can be ignored when there are no nvswitches to monitor.

Depends on the version of the nvidia-container toolkit that is being used. Note that support for nvswitches must be manually enabled: https://github.com/NVIDIA/nvidia-container-toolkit/releases/tag/v1.15.0-rc.1