rabbitmq-server
rabbitmq-server copied to clipboard
Add federation link running count as Prometheus metric
Is your feature request related to a problem? Please describe.
Currently we experience issues when doing rolling cluster restarts in which the federation links stop and don't restart until the policies are recreated. This state is difficult to monitor because Prometheus doesn't expose a metric on running federation links or similar.
Previously opened discussion on this topic here.
Describe the solution you'd like
If a simple metric like rabbitmq_federation_running_link_count: <number>
was to be exposed, this would allow us to simply alert when this number is 0 as is always the case when this condition occurs.
Describe alternatives you've considered
Using the CLI tool to extract and parse the rabbitmqctl federation_status
output but this is adding extra complexity when the other relevant metrics are already exposed to Prometheus.
Additional context
No response