--collector.stat_wal_receiver does not run if replication is setup without replication slot
What did you do? Setup replication without a replication slot.
What did you expect to see? pg_stat_walreceiver metrics (with an empty slot_name)
What did you see instead? Under which circumstances? postgres_exporter 1e [daemon.info] time=2025-09-15T16:57:19.350Z level=DEBUG source=pg_stat_walreceiver.go:164 msg="Skipping wal receiver stats because slotname host is null" collector=stat_wal_receiver
Environment Rocky 9
-
System information: Linux 5.14.0-570.23.1.el9_6.x86_64 x86_64
-
postgres_exporter version: postgres_exporter, version 0.17.1 (branch: HEAD, revision: 1e574cf4fd2a75a8a707d424eafcaa0b88cb7af4) build user: root@b0ba132eddb0 build date: 20250226-13:53:15 go version: go1.23.6 platform: linux/amd64 tags: unknown
-
postgres_exporter flags:
--config.file=/etc/postgres_exporter/postgres_exporter.yml
--log.level=debug
--collector.long_running_transactions
--collector.postmaster
--collector.stat_activity_autovacuum
--collector.stat_checkpointer
--collector.stat_statements
--collector.statio_user_indexes
--collector.stat_wal_receiver
-
PostgreSQL version: PostgreSQL 17.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5), 64-bit
-
Logs:
Sep 15 19:15:19 rd-r9-ds-5-u2 postgres_exporter 1e [daemon.info] time=2025-09-15T17:15:19.330Z level=DEBUG source=namespace.go:192 msg="Querying namespace" namespace=pg_stat_wal_receiver_schu
SSep 15 19:15:19 rd-r9-ds-5-u2 postgres_exporter 1e [daemon.info] time=2025-09-15T17:15:19.355Z level=DEBUG source=pg_stat_walreceiver.go:164 msg="Skipping wal receiver stats because slotname host is null" collector=stat_wal_receiver
Sep 15 19:15:19 rd-r9-ds-5-u2 postgres_exporter 1e [daemon.info] time=2025-09-15T17:15:19.355Z level=DEBUG source=collector.go:205 msg="collector succeeded" name=stat_wal_receiver duration_seconds=0.023864803
The related function is in file: collector/pg_stat_walreceiver.go func: Update()
if !slotName.Valid {
c.log.Debug("Skipping wal receiver stats because slotname host is null")
continue
}
I'm hitting a similar error:
Skipping wal receiver stats because upstream host is null
.. getting tripped up a few lines earlier, here: https://github.com/prometheus-community/postgres_exporter/blob/105c422dfb3222940ced0b00023225866a86ebee/collector/pg_stat_walreceiver.go#L158-L161
.. instead of where you get tripped up, here: https://github.com/prometheus-community/postgres_exporter/blob/105c422dfb3222940ced0b00023225866a86ebee/collector/pg_stat_walreceiver.go#L163-L166
I suppose these may be related.
Digging a bit more into my issue, it seems like pgStatWalReceiverQueryTemplate may be to blame.
Over here, I think we're in the hasFlushedLSN branch: https://github.com/prometheus-community/postgres_exporter/blob/105c422dfb3222940ced0b00023225866a86ebee/collector/pg_stat_walreceiver.go#L131-L135
.. and that injecting "(flushed_lsn - '0/0') % (2^52)::bigint as flushed_lsn,\n" into the query likely makes it invalid somehow.
My replication host is definitely set.
I cleaned up pgStatWalReceiverQueryTemplate (replacing %s with an empty string and have de-escaping %% to % and ''''' to ''), ending up with:
SELECT
trim(both '' from substring(conninfo from 'host=([^ ]*)')) as upstream_host,
slot_name,
status,
(receive_start_lsn- '0/0') % (2^52)::bigint as receive_start_lsn,
(flushed_lsn - '0/0') % (2^52)::bigint as flushed_lsn,
receive_start_tli,
received_tli,
extract(epoch from last_msg_send_time) as last_msg_send_time,
extract(epoch from last_msg_receipt_time) as last_msg_receipt_time,
(latest_end_lsn - '0/0') % (2^52)::bigint as latest_end_lsn,
extract(epoch from latest_end_time) as latest_end_time,
substring(slot_name from 'repmgr_slot_([0-9]*)') as upstream_node
FROM pg_catalog.pg_stat_wal_receiver;
Running this manually against my Postgres instance, I get:
upstream_host | slot_name | status | receive_start_lsn | flushed_lsn | receive_start_tli | received_tli | last_msg_send_time | last_msg_receipt_time | latest_end_lsn | latest_end_time | upstream_node
---------------+----------------------------+-----------+-------------------+-------------+-------------------+--------------+--------------------+-----------------------+----------------+-------------------+---------------
10.20.30.40 | something_something_here | streaming | 369098752 | 369099080 | 1 | 1 | 1758730951.776120 | 1758730951.763306 | 369099080 | 1758728155.319933 |
(1 row)
.. which demonstrates that upstream_host is OK.
TLDR:
- I was wrong about there being a bug in SQL query construction from
pgStatWalReceiverQueryTemplate. It's OK. - my issue is not related to the issue discussed here. I was running this Postgres Exporter with an underprivileged user, so it was rightfully returning empty column data for
upstream_host(and all other columns) and couldn't proceed. I think reporting this error ("Skipping wal receiver stats because upstream host is null") could have been done better though (as discussed below).
I was thinking there's a bug in this Postgres exporter, but.. it looks like there isn't one.
It turns out that the user that you monitor with needs to:
- either be a superuser
- or be privileged enough (having the
pg_monitorGRANT)
... so that it could yield a non-empty result for this query (coming from pgStatWalReceiverQueryTemplate):
SELECT
trim(both '' from substring(conninfo from 'host=([^ ]*)')) as upstream_host,
slot_name,
status,
(receive_start_lsn- '0/0') % (2^52)::bigint as receive_start_lsn,
(flushed_lsn - '0/0') % (2^52)::bigint as flushed_lsn,
receive_start_tli,
received_tli,
extract(epoch from last_msg_send_time) as last_msg_send_time,
extract(epoch from last_msg_receipt_time) as last_msg_receipt_time,
(latest_end_lsn - '0/0') % (2^52)::bigint as latest_end_lsn,
extract(epoch from latest_end_time) as latest_end_time,
substring(slot_name from 'repmgr_slot_([0-9]*)') as upstream_node
FROM pg_catalog.pg_stat_wal_receiver;
If the user is underprivileged, the query would still return 1 row, but all columns would be empty.
I think the current documentation (is there any for this collector?) could be improved to mention this requirement.
I also think that "all columns being empty" is a sign that the user is underprivileged. It would probably be better to:
- show a more user-friendly error, saying that a superuser or a user with a
pg_monitorGRANT is required - show this as an error at a level higher than
debug. Right now, it's failing rather silently. Perhaps the collector (while initializing) could check its grants somehow or infer that thepg_monitorpermission is missing from the empty result set and fail with a very visible error message
It looks like the issue reported by you is valid - the collector fails if reservation slots are not used (slot_name is empty).
Given that it didn't fail a few lines earlier (when checking if upstream_host is empty) gives me the impression that you are running Postgres exporter with a privileged-enough user (unlike me), so your issue is a different one than mine.