rabbitmq-server icon indicating copy to clipboard operation
rabbitmq-server copied to clipboard

Add federation link running count as Prometheus metric

Open william00179 opened this issue 1 year ago • 0 comments

Is your feature request related to a problem? Please describe.

Currently we experience issues when doing rolling cluster restarts in which the federation links stop and don't restart until the policies are recreated. This state is difficult to monitor because Prometheus doesn't expose a metric on running federation links or similar.

Previously opened discussion on this topic here.

Describe the solution you'd like

If a simple metric like rabbitmq_federation_running_link_count: <number> was to be exposed, this would allow us to simply alert when this number is 0 as is always the case when this condition occurs.

Describe alternatives you've considered

Using the CLI tool to extract and parse the rabbitmqctl federation_status output but this is adding extra complexity when the other relevant metrics are already exposed to Prometheus.

Additional context

No response

william00179 avatar Jan 15 '24 22:01 william00179