dcgm-exporter
dcgm-exporter copied to clipboard
Why gpu drain state is not included in the dcgm-exporter
Hi there,
I'm curious why the gpu drain state like the following is not included in the dcgm exporter:
Linux:~$ sudo nvidia-smi drain -p 0000:3f:00.0 -q
The current drain state of GPU 00000000:3F:00.0 is: not draining.
Also, it is not included in the dcgm-api-field-ids, which I believe should be the reason it is not exportable here. Any plans to add the support on it?
Specifically, our use case is to avoid sending the alerts if a problematic GPU has been administratively drained before being replaced physically. Therefore, any other official way to identify drained GPU may also be fit our requirements.
Thanks