DCGM
DCGM copied to clipboard
how can I clear stale XID error
We employ the dcgm-exporter to monitor our GPU cluster. Occasionally, we come across an XID error referred to as "XID 31". This error is typically caused by a user program. Interestingly, even after exiting the program, the XID error persists. I am curious if there is a method to resolve this outdated XID error. Thank you for your assistance.