postgres_exporter
postgres_exporter copied to clipboard
Have 0 lag if database is not updated
I found the lag metric increasing over night when there are no commits in the database. Setting the lag to 0
when the pg_last_wal_receive_lsn
and pg_last_wal_replay_lsn
values are equal seems to have solved that problem.
We should probably fix these queries such that they expose the metrics as timestamps, rather than delta gauges. This would make it possible to calculate things correctly in PromQL.
Actually, I think this is incorrect. What happens if the replica is behind because it's disconnected or replication is broken in some way. Wouldn't this produce a false negative, as in, 0 when it could be lagged?
Actually, I think this is incorrect. What happens if the replica is behind because it's disconnected or replication is broken in some way. Wouldn't this produce a false negative, as in, 0 when it could be lagged?
What do you recommend we change it to? Is it even possible to detect such situations?
Sorry, I'm no expert in Postgres. I mostly just handle the Go code here.
However, I am a MySQL expert.
In MySQL Land, we use a replication heartbeat system. looks like they support Postgres now.
https://www.percona.com/doc/percona-toolkit/LATEST/pt-heartbeat.html
Sorry, I'm no expert in Postgres. I mostly just handle the Go code here.
However, I am a MySQL expert.
In MySQL Land, we use a replication heartbeat system. looks like they support Postgres now.
https://www.percona.com/doc/percona-toolkit/LATEST/pt-heartbeat.html
Ok. Cool. These are the only values available to us for knowing about replication lag. https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-RECOVERY-INFO-TABLE
In this case, we are looking at a timestamp value, and it cannot be in the future, so the lag time in seconds is not possible to be negative, and the *_lsn
values are a special Postgres thing. It doesn't have a specific meaning that we can get any information from to improve the value of the metrics.
So I don't think there is anything that we can improve with this metric.