awsome-distributed-training
awsome-distributed-training copied to clipboard
SMHP: slurm exporter to report gpu metrics
Issue #, if available: N/A
Description of changes: Prometheus Slurm exporter to report GPU metrics (total, allocated).
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.