Feature Request: Add Prometheus metrics for target-loader health (e.g. Consul connectivity)
gNMIc currently exports metrics for targets (gnmic_target_up) and subscriptions, but not for target loaders (Consul, http, file, etc). If Consul is unreachable, gNMIc logs errors, but there’s no Prometheus metric to alert on. There are situations where something like Consul itself can be healthy & online but unreachable due to network partitions, and it would be great to be able to trigger alerts whenever gNMIc is unable discover targets.
I want to propose we add loader-level metrics such as:
gnmic_loader_up{loader="consul"} — 1 if loader healthy, 0 if not.
gnmic_loader_errors_total{loader="consul"} — counter of loader failures.
I'd be happy to submit a PR implement this if you think it's a good idea
Ahh I see now there are Consul-specific metrics that can be exported if I enable metrics in the loader configuration. In our use case we are using Consul for both locking and loading - so perhaps it might be nice to add metrics to the Consul locker?
Sure, what kind of metrics are you thinking about?
Maybe just a gnmic_cluster_locker_up{locker="consul"}
@karimra gnmic_cluster_locker_up{locker="consul"} sounds good! Happy to PR if needed
@karimra
gnmic_cluster_locker_up{locker="consul"}sounds good! Happy to PR if needed
Go for it, thanks!