postgres_exporter icon indicating copy to clipboard operation
postgres_exporter copied to clipboard

Exporter does not collect metrics from multiple databases

Open freneticpony1995 opened this issue 1 year ago • 5 comments

What did you do?

I've started postgres-exporter in my k8s cluster using official prometheus helmchart v5.3.0. I serve connection information to exporter using environmental variable "DATA_SOURCE_NAME" containing connection strings for 23 host with Postgresql 13.

Example: DATA_SOURCE_NAME=postgresql://username1:PASSWD1@server1:5432/service1db?sslmode=disable,postgresql://username1:PASSWD2@server2:5432/service2db?sslmode=disable ... etc

What did you expect to see?

I expected to see labels "datname" and "server" in all metrics on /metrics endpoint + all enabled metrics for all databases.

Example:

# HELP pg_stat_database_conflicts_confl_tablespace Number of queries in this database that have been canceled due to dropped tablespaces
# TYPE pg_stat_database_conflicts_confl_tablespace counter
pg_stat_database_conflicts_confl_tablespace{datid="20146",datname="service1db",server="FQDN1:5432"} 0
pg_stat_database_conflicts_confl_tablespace{datid="20366",datname="service2db",server="FQDN2:5432"} 0

What did you see instead? Under which circumstances?

I see some metrics has been exported only for 1 database (first in DATA_SOURCE_NAME list) without label "server" or even without any label.

Example:

# HELP pg_stat_database_deadlocks Number of deadlocks detected in this database
# TYPE pg_stat_database_deadlocks counter
pg_stat_database_deadlocks{datid="20146",datname="service1db"} 0

List of metrics with this issue: pg_database_size_bytes pg_locks_count pg_stat_database_* pg_stat_user_tables_* pg_statio_user_tables_.*

List of metrics without issue: pg_stat_database_conflicts_confl_* pg_stat_archiver_* pg_stat_activity_* pg_settings_* pg_stat_bgwriter_*

pg_stat_bgwriter_* metrics has been served without labels at all.

Example:

# HELP pg_stat_bgwriter_buffers_alloc_total Number of buffers allocated
# TYPE pg_stat_bgwriter_buffers_alloc_total counter
pg_stat_bgwriter_buffers_alloc_total 1.42966019e+08
# HELP pg_stat_bgwriter_buffers_backend_fsync_total Number of times a backend had to execute its own fsync call (normally the background writer handles those even when the backend does its own write)
# TYPE pg_stat_bgwriter_buffers_backend_fsync_total counter
pg_stat_bgwriter_buffers_backend_fsync_total 0
# HELP pg_stat_bgwriter_buffers_backend_total Number of buffers written directly by a backend
# TYPE pg_stat_bgwriter_buffers_backend_total counter
pg_stat_bgwriter_buffers_backend_total 5.7748244e+07
# HELP pg_stat_bgwriter_buffers_checkpoint_total Number of buffers written during checkpoints
# TYPE pg_stat_bgwriter_buffers_checkpoint_total counter
pg_stat_bgwriter_buffers_checkpoint_total 2.70184913e+08
# HELP pg_stat_bgwriter_buffers_clean_total Number of buffers written by the background writer
# TYPE pg_stat_bgwriter_buffers_clean_total counter
pg_stat_bgwriter_buffers_clean_total 1.977864e+06
# HELP pg_stat_bgwriter_checkpoint_sync_time_total Total amount of time that has been spent in the portion of checkpoint processing where files are synchronized to disk, in milliseconds
# TYPE pg_stat_bgwriter_checkpoint_sync_time_total counter
pg_stat_bgwriter_checkpoint_sync_time_total 226029
# HELP pg_stat_bgwriter_checkpoint_write_time_total Total amount of time that has been spent in the portion of checkpoint processing where files are written to disk, in milliseconds
# TYPE pg_stat_bgwriter_checkpoint_write_time_total counter
pg_stat_bgwriter_checkpoint_write_time_total 7.680013381e+09
# HELP pg_stat_bgwriter_checkpoints_req_total Number of requested checkpoints that have been performed
# TYPE pg_stat_bgwriter_checkpoints_req_total counter
pg_stat_bgwriter_checkpoints_req_total 780
# HELP pg_stat_bgwriter_checkpoints_timed_total Number of scheduled checkpoints that have been performed
# TYPE pg_stat_bgwriter_checkpoints_timed_total counter
pg_stat_bgwriter_checkpoints_timed_total 198817
# HELP pg_stat_bgwriter_maxwritten_clean_total Number of times the background writer stopped a cleaning scan because it had written too many buffers
# TYPE pg_stat_bgwriter_maxwritten_clean_total counter
pg_stat_bgwriter_maxwritten_clean_total 8284
# HELP pg_stat_bgwriter_stats_reset_total Time at which these statistics were last reset
# TYPE pg_stat_bgwriter_stats_reset_total counter
pg_stat_bgwriter_stats_reset_total 1.646306272e+09

Environment

  • System information:

k8s cluster

  • postgres_exporter version:
postgres_exporter, version 0.15.0 (branch: HEAD, revision: 68c176b8833b7580bf847cecf60f8e0ad5923f9a)
  build user:       root@88f74f2c2888
  build date:       20231027-14:38:56
  go version:       go1.21.3
  platform:         linux/amd64
  tags:             unknown
  • postgres_exporter flags:
'--config.file=/etc/postgres_exporter.yml'
'--web.listen-address=:9187'
'--log.level=debug'
'--collector.long_running_transactions'
'--collector.stat_activity_autovacuum'
'--collector.process_idle'
'--no-collector.wal'
  • PostgreSQL version: PostgreSQL 13
  • Logs: Logs does not contain errors
ts=2024-01-24T13:57:26.583Z caller=main.go:86 level=warn msg="Error loading config" err="Error opening config file \"/etc/postgres_exporter.yml\": open /etc/postgres_exporter.yml: no such file or directory"
ts=2024-01-24T13:57:26.583Z caller=proc.go:267 msg="Excluded databases" databases=[]
ts=2024-01-24T13:57:26.584Z caller=tls_config.go:274 level=info msg="Listening on" address=[::]:9187
ts=2024-01-24T13:57:26.584Z caller=tls_config.go:277 level=info msg="TLS is disabled." http2=false address=[::]:9187
ts=2024-01-24T13:58:02.930Z caller=server.go:74 level=info msg="Established new database connection" fingerprint=server970:5432
ts=2024-01-24T13:58:02.942Z caller=postgres_exporter.go:585 level=debug msg="Querying PostgreSQL version" server=server970:5432
ts=2024-01-24T13:58:02.942Z caller=postgres_exporter.go:613 level=info msg="Semantic version changed" server=server970:5432 from=0.0.0 to=13.6.0
ts=2024-01-24T13:58:02.942Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=procpid
ts=2024-01-24T13:58:02.943Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=sent_location
ts=2024-01-24T13:58:02.943Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=replay_location
ts=2024-01-24T13:58:02.943Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=flush_location
ts=2024-01-24T13:58:02.943Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=write_location
ts=2024-01-24T13:58:02.943Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=pg_xlog_location_diff
ts=2024-01-24T13:58:02.943Z caller=pg_setting.go:35 level=debug msg="Querying pg_setting view" server=server970:5432
ts=2024-01-24T13:58:02.944Z caller=collector.go:206 level=debug msg="collector succeeded" name=stat_activity_autovacuum duration_seconds=0.00232475
ts=2024-01-24T13:58:02.945Z caller=collector.go:206 level=debug msg="collector succeeded" name=replication duration_seconds=0.003091174
ts=2024-01-24T13:58:02.947Z caller=namespace.go:193 level=debug msg="Querying namespace" namespace=pg_stat_database_conflicts
ts=2024-01-24T13:58:02.948Z caller=namespace.go:193 level=debug msg="Querying namespace" namespace=pg_stat_replication
ts=2024-01-24T13:58:02.950Z caller=namespace.go:193 level=debug msg="Querying namespace" namespace=pg_replication_slots
ts=2024-01-24T13:58:02.951Z caller=namespace.go:193 level=debug msg="Querying namespace" namespace=pg_stat_archiver
ts=2024-01-24T13:58:02.952Z caller=namespace.go:193 level=debug msg="Querying namespace" namespace=pg_stat_activity
ts=2024-01-24T13:58:02.954Z caller=server.go:74 level=info msg="Established new database connection" fingerprint=server910:5432
ts=2024-01-24T13:58:02.956Z caller=pg_stat_database.go:280 level=debug collector=stat_database msg="Skipping collecting metric because it has no datname"
ts=2024-01-24T13:58:02.957Z caller=pg_stat_database.go:350 level=debug collector=stat_database msg="No metric for stats_reset, will collect 0 instead"
ts=2024-01-24T13:58:02.957Z caller=pg_stat_database.go:350 level=debug collector=stat_database msg="No metric for stats_reset, will collect 0 instead"
ts=2024-01-24T13:58:02.957Z caller=collector.go:206 level=debug msg="collector succeeded" name=stat_database duration_seconds=0.014814364
ts=2024-01-24T13:58:02.957Z caller=collector.go:206 level=debug msg="collector succeeded" name=stat_bgwriter duration_seconds=0.015447574
ts=2024-01-24T13:58:02.959Z caller=collector.go:206 level=debug msg="collector succeeded" name=long_running_transactions duration_seconds=0.016531943
ts=2024-01-24T13:58:02.974Z caller=postgres_exporter.go:585 level=debug msg="Querying PostgreSQL version" server=server910:5432
ts=2024-01-24T13:58:02.975Z caller=postgres_exporter.go:613 level=info msg="Semantic version changed" server=server910:5432 from=0.0.0 to=13.5.0
ts=2024-01-24T13:58:02.975Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=procpid
ts=2024-01-24T13:58:02.975Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=sent_location
ts=2024-01-24T13:58:02.975Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=replay_location
ts=2024-01-24T13:58:02.975Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=flush_location
ts=2024-01-24T13:58:02.975Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=write_location
ts=2024-01-24T13:58:02.975Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=pg_xlog_location_diff
ts=2024-01-24T13:58:02.975Z caller=pg_setting.go:35 level=debug msg="Querying pg_setting view" server=server910:5432
ts=2024-01-24T13:58:02.979Z caller=collector.go:206 level=debug msg="collector succeeded" name=stat_user_tables duration_seconds=0.036695873
ts=2024-01-24T13:58:02.981Z caller=namespace.go:193 level=debug msg="Querying namespace" namespace=pg_stat_database_conflicts
ts=2024-01-24T13:58:02.982Z caller=collector.go:206 level=debug msg="collector succeeded" name=statio_user_tables duration_seconds=0.04011164
ts=2024-01-24T13:58:02.983Z caller=collector.go:206 level=debug msg="collector succeeded" name=replication_slot duration_seconds=0.040864841
ts=2024-01-24T13:58:02.986Z caller=collector.go:206 level=debug msg="collector succeeded" name=locks duration_seconds=0.043772689
ts=2024-01-24T13:58:02.989Z caller=collector.go:206 level=debug msg="collector succeeded" name=process_idle duration_seconds=0.046559469
ts=2024-01-24T13:58:02.993Z caller=namespace.go:193 level=debug msg="Querying namespace" namespace=pg_stat_replication
ts=2024-01-24T13:58:02.995Z caller=collector.go:206 level=debug msg="collector succeeded" name=database duration_seconds=0.053059736
ts=2024-01-24T13:58:02.996Z caller=namespace.go:193 level=debug msg="Querying namespace" namespace=pg_replication_slots
ts=2024-01-24T13:58:02.998Z caller=namespace.go:193 level=debug msg="Querying namespace" namespace=pg_stat_archiver
ts=2024-01-24T13:58:02.999Z caller=namespace.go:193 level=debug msg="Querying namespace" namespace=pg_stat_activity

freneticpony1995 avatar Jan 24 '24 15:01 freneticpony1995

There's also all metrics following this pattern pg_stat_statements_* are only showing the values for the first datasource.

hajali-amine avatar Feb 07 '24 17:02 hajali-amine

What is the update on this? I think deprecating the auto-discover-databases feature without any alternative solution is very disadvantageous.

oadekoya avatar Feb 28 '24 15:02 oadekoya

It seems to be because of this logic, which takes only first DSN to instantiate the collector.

When I tried to instantiate multiple collectors in a fork I ran into issues with attempts to register multiple metrics. Seems like running an exporter per DB is the workaround for now.

thampiotr avatar Apr 02 '24 08:04 thampiotr

The best way to use one collector for metrics on several postgres instances is to use the multi-target support. We're in the (slow) process of standardizing the code and how the labels are applied.

sysadmind avatar Apr 02 '24 13:04 sysadmind

Thanks for the context @sysadmind! We're looking forward to the new versions with standardized implementation.

Are all the metrics, including the ones defined in the postgres_exporter.go and custom metrics from a YAML file supported by the multi-target support? Or only those from collector package?

thampiotr avatar Apr 02 '24 14:04 thampiotr