postgres_exporter icon indicating copy to clipboard operation
postgres_exporter copied to clipboard

Expand metrics on pg_stat_replication to include lag expressed as time.

Open ahjmorton opened this issue 3 years ago • 2 comments

Expands the metrics exposed from pg_stat_replication to include lag as reported from the wal sender perspective. Also adds a collector for pg_stat_wal_receiver for monitoring from the standby side.

The following examples are from a locally running streaming replica setup and calling postgres_exporter

Primary:

# HELP postgres_stat_replication_flush_lag_seconds flush_lag as reported by the pg_stat_replication view converted to seconds
# TYPE postgres_stat_replication_flush_lag_seconds gauge
postgres_stat_replication_flush_lag_seconds{application_name="walreceiver",client_addr="172.28.0.3",state="streaming",sync_state="sync"} 0.002844
# HELP postgres_stat_replication_lag_bytes delay in bytes pg_wal_lsn_diff(pg_current_wal_lsn(), replay_location)
# TYPE postgres_stat_replication_lag_bytes gauge
postgres_stat_replication_lag_bytes{application_name="walreceiver",client_addr="172.28.0.3",state="streaming",sync_state="sync"} 0
# HELP postgres_stat_replication_replay_lag_seconds replay_lag as reported by the pg_stat_replication view converted to seconds
# TYPE postgres_stat_replication_replay_lag_seconds gauge
postgres_stat_replication_replay_lag_seconds{application_name="walreceiver",client_addr="172.28.0.3",state="streaming",sync_state="sync"} 0.00317
8
# HELP postgres_stat_replication_write_lag_seconds write_lag as reported by the pg_stat_replication view converted to seconds
# TYPE postgres_stat_replication_write_lag_seconds gauge

Standby:

# HELP postgres_wal_receiver_replay_lag_bytes delay in standby wal replay bytes pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn())::float
# TYPE postgres_wal_receiver_replay_lag_bytes gauge
postgres_wal_receiver_replay_lag_bytes{status="streaming"} 0
# HELP postgres_wal_receiver_replay_lag_seconds delay in standby wal replay seconds EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()
# TYPE postgres_wal_receiver_replay_lag_seconds gauge
postgres_wal_receiver_replay_lag_seconds{status="streaming"} 0

References

ahjmorton avatar Apr 28 '22 08:04 ahjmorton

hey @ahjmorton

Firstly, thank you so much for this. I will review the PR in detail in the next few days.

I think you can drop support for Postgres older than 10, as 9.6 reached the end of life. See https://www.postgresql.org/support/versioning/

That will make the changes a bit easier, I think

rnaveiras avatar Apr 28 '22 09:04 rnaveiras

Hey @rnaveiras and @ttamimi . I'm happy to leave this PR open and work on getting the metrics in there. Reckon it's worth me taking a look? Realise it's been a long time

ahjmorton avatar Sep 29 '23 10:09 ahjmorton