Exporter container log of replica instance always reports pg_replication_slots pq: recovery is in progress
What did you do? I have set up a crunchy postgres cluster on my OpenShift cluster with 1 master and 2 replica instances. exporter container is running as sidecar container. All the replicas logs the following error message:
ts=2023-11-16T13:00:00.362Z caller=namespace.go:236 level=info err="Error running query on database \"localhost:5432\": pg_replication_slots pq: recovery is in progress"
ts=2023-11-16T13:00:00.379Z caller=postgres_exporter.go:731 level=error err="queryNamespaceMappings returned 1 errors"
What did you expect to see? Replication slots on Replicas are always inactive and in recovery mode so I don't expect to see any errors here
What did you see instead? Under which circumstances?
All replicas reports the same messages listed here:
ts=2023-11-16T13:00:00.362Z caller=namespace.go:236 level=info err="Error running query on database \"localhost:5432\": pg_replication_slots pq: recovery is in progress"
ts=2023-11-16T13:00:00.379Z caller=postgres_exporter.go:731 level=error err="queryNamespaceMappings returned 1 errors"
Environment
OpenShift 4.11 on Azure
- System information:
Linux 4.18.0-372.76.1.el8_6.x86_64 x86_64
- postgres_exporter version:
postgres_exporter, version 0.10.1 (branch: HEAD, revision: 6cff384d7433bcb1104efe3b496cd27c0658eb09) build user: root@eb21848025d7 build date: 20220114-17:20:30 go version: go1.17.6 platform: linux/amd64
- postgres_exporter flags:
- name: CONFIG_DIR
value: /opt/cpm/conf
- name: POSTGRES_EXPORTER_PORT
value: '9187'
- name: PGBACKREST_INFO_THROTTLE_MINUTES
value: '10'
- name: PG_STAT_STATEMENTS_LIMIT
value: '20'
- name: PG_STAT_STATEMENTS_THROTTLE_MINUTES
value: '-1'
- name: EXPORTER_PG_HOST
value: localhost
- name: EXPORTER_PG_PORT
value: '5432'
- name: EXPORTER_PG_DATABASE
value: postgres
- name: EXPORTER_PG_USER
value: ccp_monitoring
- name: EXPORTER_PG_PASSWORD
valueFrom:
secretKeyRef:
name: flexis-io-dev-scm-billing-monitoring
key: password
- PostgreSQL version:
psql (PostgreSQL) 13.6
- Logs:
ts=2023-11-16T13:00:00.362Z caller=namespace.go:236 level=info err="Error running query on database \"localhost:5432\": pg_replication_slots pq: recovery is in progress"
ts=2023-11-16T13:00:00.379Z caller=postgres_exporter.go:731 level=error err="queryNamespaceMappings returned 1 errors"
It seems to me that pg_current_wal_lsn() function call caused this issue in queries.go
Other collectors use an idiom like:
(case pg_is_in_recovery() when 't' then null else pg_current_wal_lsn() end) AS pg_current_wal_lsn,
but not in this query. You can't call this function in PostgreSQL sending and receiving replication (this situationr happens in the "child" in parent-child-grandchid replication senario).
To avoid this error, fix this issue or --no-collector.replication_slot option might help.
Same problem. PostgreSQL 14.8, postgres_exporter 0.15.0 And --no-collector.replication_slot does not fix this.
I'm facing the same issue. PostgreSQL: 16.2.0 Exporter: postgres-exporter:v0.15.0
postgres-exporter ts=2024-05-14T02:00:02.451Z caller=namespace.go:236 level=info err="Error running query on database \"192.168.0.3:5432\": pg_replication_slots pq: recovery is in progress"
postgres-exporter ts=2024-05-14T02:00:02.451Z caller=postgres_exporter.go:682 level=error err="queryNamespaceMappings returned 1 errors"
postgres-exporter ts=2024-05-14T02:00:05.266Z caller=namespace.go:236 level=info err="Error running query on database \"192.168.0.2:5432\": pg_replication_slots pq: recovery is in progress"
postgres-exporter ts=2024-05-14T02:00:05.348Z caller=postgres_exporter.go:682 level=error err="queryNamespaceMappings returned 1 errors"
postgres-exporter ts=2024-05-14T02:00:05.956Z caller=namespace.go:236 level=info err="Error running query on database \"192.168.0.2:5432\": pg_replication_slots pq: recovery is in progress"
postgres-exporter ts=2024-05-14T02:00:05.956Z caller=postgres_exporter.go:682 level=error err="queryNamespaceMappings returned 1 errors"
postgres-exporter ts=2024-05-14T02:00:08.350Z caller=namespace.go:236 level=info err="Error running query on database \"192.168.0.2:5432\": pg_replication_slots pq: recovery is in progress"
Same on patroni cluster with pglogical to another cluster
pg 13 exporter 0.15.0, how to fix it? @sysadmind / @SuperQ / @Sticksman /
I also encountered this problem, but despite the errors, replication is going correctly. how to fix it?
I'm experiencing similar issues.
We have the same issue on a Patroni cluster with postgres 17
FWIW - me too. I am involved in an upgrade from PG14/pg-exporter 0.11 and assumed it would take care of this.. but it seems not?
Edit: oops, just noticed link above suggests this was fixed in 0.16...
I has the error with versions 0.11.0 and 0.15.0 using --no-collector.replication_slot, but It fixes upgrading to version 0.17.0.