netdata-cloud icon indicating copy to clipboard operation
netdata-cloud copied to clipboard

[Feat]: Postgres replication slot monitoring

Open cacraig opened this issue 1 year ago • 0 comments

Problem

Hi!

Since Postgres 17, the pg_replication_slots returns an "inactive_since" field for replication slots where active = FALSE. It would be super helpful to monitor for streaming replicas that go offline.

Description

^^

Importance

really want

Value proposition

(1) This is very useful information to monitor for DB admins. A replica going offline or losing connection with the primary can cause downtime, stale data, table bloat on the primary, among other things.

Proposed implementation

A simple chart or monitor that just shows the number of expected active replica slots: select count(*) from pg_replication_slots; vs the number of actual active slots select count(*) from pg_replication_slots where active=TRUE; would be great. I think it would work exactly like your monitor for RAID configurations (Storage -> Management -> md.health, and md.disks). Where it shows the number of faulty (inactive replication slots), and the total number (total replication slots).

Displaying the new inactive_since (timestamp or in seconds/mins) field for inactive replicas would also be neat if possible.

cacraig avatar Dec 18 '24 02:12 cacraig