awsome-distributed-training icon indicating copy to clipboard operation
awsome-distributed-training copied to clipboard

SMHP: slurm exporter to report gpu metrics

Open verdimrc opened this issue 11 months ago • 1 comments

Issue #, if available: N/A

Description of changes: Prometheus Slurm exporter to report GPU metrics (total, allocated).

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

verdimrc avatar Mar 06 '24 07:03 verdimrc