glowkey
glowkey
This is caused by a change in driver 510 that lumps the reserved memory into the used category. We are updating DCGM to handle this case and split the used...
Timeline is for a DCGM 2.4 based dcgm-exporter by the end of May 2022.
Which metrics are being watched? What is DCGM_EXPORTER_INTERVAL set to? In general what are all the changes from a default installation? This information could help us.
Tracking here: https://github.com/NVIDIA/dcgm-exporter/issues/111
Currently this isn't possible as the MIG partition UUID isn't available from DCGM.
Can you attach the logs from the pod?
If you are running dcgm-exporter inside a container, it's possible the switches and links are not mounted. See https://github.com/NVIDIA/dcgm-exporter/issues/169#issuecomment-1604771610 But also note you may run into this outstanding issue related...
Profiling metrics are the average over the update interval (the updateFrequency) parameter.
The `libnvidia-nscq` message can be ignored when there are no nvswitches to monitor.
Depends on the version of the nvidia-container toolkit that is being used. Note that support for nvswitches must be manually enabled: https://github.com/NVIDIA/nvidia-container-toolkit/releases/tag/v1.15.0-rc.1