stat_activity_autovacuum: timestamp_seconds can be NULL?
What did you do?
Looked at the log output, because I'm missing values in the timeseries.
What did you expect to see?
What did you see instead? Under which circumstances?
postgres_exporter[48820]: ts=2023-11-08T13:55:08.286Z caller=collector.go:202 level=error msg="collector failed" name=stat_activity_autovacuum duration_seconds=0.081215406 err="sql: Scan error on column index 1, name \"timestamp_seconds\": converting NULL to float64 is unsupported"
Environment
- System information:
$ uname -srm
Linux 4.19.0-25-amd64 x86_64
$ lsb_release -a
Distributor ID: Debian
Description: Debian GNU/Linux 10 (buster)
Release: 10
Codename: buster
- postgres_exporter version:
$ /usr/local/bin/postgres_exporter --version
postgres_exporter, version 0.15.0 (branch: HEAD, revision: 68c176b8833b7580bf847cecf60f8e0ad5923f9a)
build user: root@88f74f2c2888
build date: 20231027-14:38:56
go version: go1.21.3
platform: linux/amd64
tags: unknown
- postgres_exporter flags:
--collector.stat_statements --collector.stat_wal_receiver --collector.stat_activity_autovacuum --collector.long_running_transactions --collector.database_wraparound
- PostgreSQL version:
# show server_version;
server_version
--------------------------------
15.4 (Debian 15.4-2.pgdg100+1)
- Logs:
Nov 08 04:27:33 postgres_exporter[48820]: ts=2023-11-08T03:27:33.146Z caller=collector.go:202 level=error msg="collector failed" name=stat_activity_autovacuum duration_seconds=0.097266039 err="sql: Scan error on column index 1, name \"timestamp_seconds\": converting NULL to float64 is unsupported"
Nov 08 05:08:38 postgres_exporter[48820]: ts=2023-11-08T04:08:38.323Z caller=collector.go:202 level=error msg="collector failed" name=stat_activity_autovacuum duration_seconds=0.112045016 err="sql: Scan error on column index 1, name \"timestamp_seconds\": converting NULL to float64 is unsupported"
Nov 08 12:02:03 postgres_exporter[48820]: ts=2023-11-08T11:02:03.152Z caller=collector.go:202 level=error msg="collector failed" name=stat_activity_autovacuum duration_seconds=0.104722171 err="sql: Scan error on column index 1, name \"timestamp_seconds\": converting NULL to float64 is unsupported"
Nov 08 12:10:03 postgres_exporter[48820]: ts=2023-11-08T11:10:03.168Z caller=collector.go:202 level=error msg="collector failed" name=stat_activity_autovacuum duration_seconds=0.118454155 err="sql: Scan error on column index 1, name \"timestamp_seconds\": converting NULL to float64 is unsupported"
Nov 08 12:53:18 postgres_exporter[48820]: ts=2023-11-08T11:53:18.076Z caller=collector.go:202 level=error msg="collector failed" name=stat_activity_autovacuum duration_seconds=0.026987366 err="sql: Scan error on column index 1, name \"timestamp_seconds\": converting NULL to float64 is unsupported"
Nov 08 13:59:08 postgres_exporter[48820]: ts=2023-11-08T12:59:08.284Z caller=collector.go:202 level=error msg="collector failed" name=stat_activity_autovacuum duration_seconds=0.082936131 err="sql: Scan error on column index 1, name \"timestamp_seconds\": converting NULL to float64 is unsupported"
Nov 08 14:24:53 postgres_exporter[48820]: ts=2023-11-08T13:24:53.230Z caller=collector.go:202 level=error msg="collector failed" name=stat_activity_autovacuum duration_seconds=0.027914855 err="sql: Scan error on column index 1, name \"timestamp_seconds\": converting NULL to float64 is unsupported"
Nov 08 14:55:08 postgres_exporter[48820]: ts=2023-11-08T13:55:08.286Z caller=collector.go:202 level=error msg="collector failed" name=stat_activity_autovacuum duration_seconds=0.081215406 err="sql: Scan error on column index 1, name \"timestamp_seconds\": converting NULL to float64 is unsupported"
Nov 08 15:37:33 postgres_exporter[48820]: ts=2023-11-08T14:37:33.145Z caller=collector.go:202 level=error msg="collector failed" name=stat_activity_autovacuum duration_seconds=0.095623407 err="sql: Scan error on column index 1, name \"timestamp_seconds\": converting NULL to float64 is unsupported"
Nov 08 15:53:18 postgres_exporter[48820]: ts=2023-11-08T14:53:18.087Z caller=collector.go:202 level=error msg="collector failed" name=stat_activity_autovacuum duration_seconds=0.037255437 err="sql: Scan error on column index 1, name \"timestamp_seconds\": converting NULL to float64 is unsupported"
According to the docs the xact_start field can be NULL when no transaction is active. Looking at the code, I'm not entirely sure what the meaning of the metric is, so I'm not sure what should be done if that column is NULL. The typical options are to report a 0 value or to skip the metric all together. I usually lean towards the latter.
Maybe @Sticksman can weigh in as they originally contributed the collector.
I start seeing this error, have been a solution applied? is there a workaround we can implement? when this error occurs the exporter doesn't expose any metric.
the error is in this operation:
COALESCE(MAX(EXTRACT(EPOCH FROM now() - xact_start))::float,0)
This can be fixed by:
- Adding a condition:
WHERE xact_start IS NOT NULL - avoid the NULL value with
COALESCE(xact_start,now())
Because I don't understand the meaning of the metric, I will defer the choice to someone else. Maybe @SuperQ.
My feedback:
- The
IS NOT NULLcondition could mean the row isn't returned, skipping the metric. Maybe this could alternatively be handled in the code. I don't know which is better. Both could stop the error. -
COALESCEwould usenow()whenxact_startis NULL which could change the meaning of the value. Just an observation because I don't have the context to know if this matters
The metric display the auto-vacuum duration, but normally there are more than one autovacuum running at the same time, and this fails with one of them having that field NULL.
For a short period of time, some of these entries in pg_stat_activity can have the field xact_start = NULL then the subtraction operation returns NULL and the query fails, so you can even not return the metric in that moment, but it will when the xact_start has a value then the condition WHERE xact_start IS NOT NULL helps, but also you can use the COALESCE with now() because this field is null at the beginning of the operation.
From my point of view both solutions are valid.
FYI, I make a local build using the COALESCE(xact_start,now()) and I got the following error.
`An error has occurred while serving metrics:
collected metric "pg_stat_activity_autovacuum_timestamp_seconds" { label:{name:"relname" value:"device_identifier"} gauge:{value:1.722847480622177e+09}} was collected before with the same name and label values`
I'm going to try the WHERE xact_start IS NOT NULL and let you know.
the error is because the metric only shows the relname, but not schema and this DB has relations with same name but in different schemas.
For version 14+: SPLIT_PART(query, ' ', -1) AS relname
For version 13-: (REGEXP_MATCH(query,'\w+.\w+$'))[1] AS relname
In that way the schema is not removed!