writing Prometheus exporter
Describe the feature
Application monitoring is essential for every production software system. Prometheus is an open-source monitoring system which was created in 2012 by Soundcloud. The Prometheus server collects metrics from your servers and other monitoring targets by pulling their metric endpoints over HTTP at a predefined time interval. For ephemeral and batch jobs, for which metrics can't be scraped periodically due to their short-lived nature, Prometheus offers a Pushgateway. This is an intermediate server that monitoring targets can push their metrics to before exiting. Adding Prometheus to the project can help monitor the health of the cluster. By adding custom metrics monitoring of the training process can be easily done.
Hi @SMesForoush ! Thank you for your feature request!
We are thinking over your idea, and would you bother telling us something about your usage scenario? It helps a lot.
We have updated a lot. This issue was closed due to inactivity. Thanks.