postgres_exporter
postgres_exporter copied to clipboard
Huge memory usage on postgres process from exporter connection.
What did you do? Use postgres exporter via docker. What did you expect to see? Stable memory usage on the Postgres side. What did you see instead? Under which circumstances? Gradual increase in memory usage on postgres for the PID that relates to the connection from postgres_exporter. Environment Dockerized postgres, dockerized eporter running on Ubuntu VM
-
System information: Linux 4.15.0-180-generic x86_64
-
postgres_exporter version:
postgres_exporter v0.8.0-21-g8531aba (built with go1.11)
-
postgres_exporter flags:
- DATA_SOURCE_NAME=postgresql://$DB_USERNAME:$DB_PASSWORD@postgres:5432/database?sslmode=disable
- PG_EXPORTER_EXTEND_QUERY_PATH=/queries.yml
-
PostgreSQL version:
PostgreSQL 14.4 (Ubuntu 14.4-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, 64-bit
-
Logs:
No Logs
- Top output on postgres server:
top - 06:56:21 up 117 days, 14:43, 0 users, load average: 0.34, 0.31, 0.27
Tasks: 22 total, 1 running, 21 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.4 us, 0.1 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.1 st
MiB Mem : 31623.3 total, 452.7 free, 24605.8 used, 6564.8 buff/cache
MiB Swap: 3913.0 total, 3788.5 free, 124.5 used. 5609.7 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
84191 postgres 20 0 9760792 56332 52708 S 0.3 0.2 0:52.71 postgres
1 postgres 20 0 9742468 220728 219976 S 0.0 0.7 7:24.14 postgres
14 postgres 20 0 9742620 187232 186332 S 0.0 0.6 1:02.18 postgres
15 postgres 20 0 9742572 90528 89772 S 0.0 0.3 0:51.94 postgres
16 postgres 20 0 9742468 22204 21520 S 0.0 0.1 1:11.24 postgres
17 postgres 20 0 9743144 7672 6860 S 0.0 0.0 1:45.70 postgres
18 postgres 20 0 75304 6504 4480 S 0.0 0.0 48:42.36 postgres
19 postgres 20 0 9742896 5612 4912 S 0.0 0.0 0:03.55 postgres
45 postgres 20 0 26.8g 17.5g 115200 S 0.0 56.5 3410:00 postgres
84190 postgres 20 0 9766768 63028 58224 S 0.0 0.2 0:53.69 postgres
84869 root 20 0 8020 4108 3500 S 0.0 0.0 0:00.09 bash
85342 postgres 20 0 9758948 70932 65392 S 0.0 0.2 0:00.13 postgres
85383 postgres 20 0 9758868 66704 61804 S 0.0 0.2 0:00.08 postgres
85425 postgres 20 0 9757864 65960 61712 S 0.0 0.2 0:00.11 postgres
85466 postgres 20 0 9758040 65172 60960 S 0.0 0.2 0:00.11 postgres
85507 postgres 20 0 9758272 69368 64708 S 0.0 0.2 0:00.14 postgres
85550 postgres 20 0 9757808 62656 58756 S 0.0 0.2 0:00.06 postgres
85591 postgres 20 0 9756636 47044 43984 S 0.0 0.1 0:00.04 postgres
85632 postgres 20 0 9759272 63788 58728 S 0.0 0.2 0:00.06 postgres
85673 postgres 20 0 9744404 34332 31904 S 0.0 0.1 0:00.05 postgres
85716 postgres 20 0 9743600 28160 26084 S 0.0 0.1 0:00.01 postgres
85738 root 20 0 10904 3868 3160 R 0.0 0.0 0:00.00 top
PID 45 is the exporter.
pg_stat_activity for PID 45:
datid | 16401
datname | affectli
pid | 45
leader_pid |
usesysid | 10
usename | postgres
application_name |
client_addr | 172.19.0.22
client_hostname |
client_port | 53706
backend_start | 2022-08-19 13:32:27.893675+00
xact_start |
query_start | 2022-09-26 07:10:28.67894+00
state_change | 2022-09-26 07:10:29.088342+00
wait_event_type | Client
wait_event | ClientRead
state | idle
backend_xid |
backend_xmin |
query_id | -4060127036400457215
query | select * from (SELECT current_database() as databasename, nspname as schemaname, c.relname AS tablename, index_name as indexname, bs*(sub.relpages)::bigint AS totalbytes, CASE WHEN sub.relpages <= otta THEN 0 ELSE bs*(sub.relpages-otta)::bigint END AS wastedbytes, CASE WHEN sub.relpages <= otta THEN 0 ELSE bs*(sub.relpages-otta)::bigint * 100 / (bs*(sub.relpages)::bigint) END AS realbloat FROM ( SELECT bs, nspname, table_oid, index_name, relpages, coalesce( ceil((reltuples*(4+nulldatahdrwidth))/(bs-pagehdr::float)) + CASE WHEN am.amname IN ('hash','btree') THEN 1 ELSE 0 END , 0) AS otta FROM ( SELECT maxalign, bs, nspname, relname AS index_name, reltuples, relpages, relam, table_oid, ( index_tuple_hdr_bm + maxalign - CASE WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign ELSE index_tuple_hdr_bm % maxalign END + nulldatawidth + maxalign - CASE WHEN nulldatawidth::integer%maxalign = 0 THEN maxalign ELSE nulldatawidth::integer%maxalign END )::numeric AS nulldatahdrwidth, pagehdr FROM ( SELECT i.nspna
backend_type | client backend
Not sure if this will help at all but here is some output from /proc/45/smamps
:
564094733000-564409b66000 rw-p 00000000 00:00 0 [heap]
Size: 14504140 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 14490480 kB
Pss: 14490480 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 4 kB
Private_Dirty: 14490476 kB
Referenced: 14463028 kB
Anonymous: 14490480 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 12264 kB
SwapPss: 12264 kB
Locked: 0 kB
VmFlags: rd wr mr mw me ac sd
7fb7569d4000-7fb7b69d5000 rw-p 00000000 00:00 0
Size: 1572868 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 1572868 kB
Pss: 1572868 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 1572868 kB
Referenced: 1572448 kB
Anonymous: 1572868 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
VmFlags: rd wr mr mw me ac sd
7fb8042e4000-7fb8842e5000 rw-p 00000000 00:00 0
Size: 2097156 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 2097156 kB
Pss: 2097156 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 2097156 kB
Referenced: 2096928 kB
Anonymous: 2097156 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
VmFlags: rd wr mr mw me ac sd
~~The postgres_exporter includes pprof, so you can get a direct dump of memory use.~~
Sorry, I misread. You're talking about postgres memory use, not the exporter.
First, please update your exporter, the version you are using is very old. Second, what queries have you configured in the queries.yaml?
About updating the exporter, is there a newer docker image I can use? This is the latest one I can find here: https://hub.docker.com/r/wrouesnel/postgres_exporter/tags
Here are the queries from the .yml file (metrics removed for brevity)
pg_replication:
query: "SELECT case when pg_is_in_recovery() then EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) else 0 end as lag"
pg_postmaster:
query: "SELECT pg_postmaster_start_time as start_time_seconds from pg_postmaster_start_time()"
pg_stat_user_tables:
query: "SELECT schemaname, relname, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd, n_live_tup, n_dead_tup, n_mod_since_analyze, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze, vacuum_count, autovacuum_count, analyze_count, autoanalyze_count FROM pg_stat_user_tables"
pg_database:
query: " SELECT pg_database.datname, pg_database_size(pg_database.datname) as size FROM pg_database"
pg_table_size:
query: "SELECT (table_schema||'.'||table_name) as table_name, row_estimate, total_bytes, index_bytes, toast_bytes, table_bytes FROM (SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME, c.reltuples AS row_estimate, pg_total_relation_size(c.oid) AS total_bytes, pg_indexes_size(c.oid) AS index_bytes, pg_total_relation_size(reltoastrelid) AS toast_bytes FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace WHERE relkind = 'r' and pg_total_relation_size(c.oid) > 100000000) a) a"
pg_stat_statements:
query: "select dbid, queryid, regexp_replace(substr(query,1,256), E'[\n\r]+', ' ', 'g' ) as query, calls, total_exec_time, min_exec_time, max_exec_time, mean_exec_time, stddev_exec_time, rows from pg_stat_statements where calls > 100 and dbid = (SELECT datid FROM pg_stat_database WHERE datname = current_database()) ORDER BY total_exec_time desc limit 20"
pg_stat_activity_detail:
query: "select datname, case when substr(application_name,1,7) = 'pgAdmin' then 'pgAdmin' when application_name = '' then 'undefined' else application_name end as application_name, usename, client_addr, state, count(*) as count from pg_stat_activity group by 1,2,3,4,5"
pg_stat_activity_tx_detail:
query: "select datname, case when substr(application_name,1,7) = 'pgAdmin' then 'pgAdmin' when application_name = '' then 'undefined' else application_name end as application_name, usename, client_addr, state, MAX(EXTRACT(EPOCH FROM now() - xact_start))::float as duration from pg_stat_activity where xact_start is not null group by 1,2,3,4,5"
pg_exclusive_lock_details:
query: "select l.relation::regclass as relation, a.application_name, regexp_replace(substr(a.query,1,256), E'[\\n\\r]+', ' ', 'g' ) as query, granted::text as granted, count(*) as count FROM pg_locks l JOIN pg_stat_activity a ON a.pid = l.pid where a.datname = 'affectli' and mode = 'ExclusiveLock' and locktype = 'tuple' group by 1,2,3,4"
pg_index_bloat:
query: "select * from (SELECT current_database() as databasename, nspname as schemaname, c.relname AS tablename, index_name as indexname, bs*(sub.relpages)::bigint AS totalbytes, CASE WHEN sub.relpages <= otta THEN 0 ELSE bs*(sub.relpages-otta)::bigint END AS wastedbytes, CASE WHEN sub.relpages <= otta THEN 0 ELSE bs*(sub.relpages-otta)::bigint * 100 / (bs*(sub.relpages)::bigint) END AS realbloat FROM ( SELECT bs, nspname, table_oid, index_name, relpages, coalesce( ceil((reltuples*(4+nulldatahdrwidth))/(bs-pagehdr::float)) + CASE WHEN am.amname IN ('hash','btree') THEN 1 ELSE 0 END , 0) AS otta FROM ( SELECT maxalign, bs, nspname, relname AS index_name, reltuples, relpages, relam, table_oid, ( index_tuple_hdr_bm + maxalign - CASE WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign ELSE index_tuple_hdr_bm % maxalign END + nulldatawidth + maxalign - CASE WHEN nulldatawidth::integer%maxalign = 0 THEN maxalign ELSE nulldatawidth::integer%maxalign END )::numeric AS nulldatahdrwidth, pagehdr FROM ( SELECT i.nspname, i.relname, i.reltuples, i.relpages, i.relam, s.starelid, a.attrelid AS table_oid, current_setting('block_size')::numeric AS bs, CASE WHEN version() ~ 'mingw32' OR version() ~ '64-bit' THEN 8 ELSE 4 END AS maxalign, CASE WHEN substring(current_setting('server_version') FROM '#\"[0-9]+#\"%' FOR '#')::integer > 7 THEN 24 ELSE 20 END AS pagehdr, CASE WHEN max(coalesce(s.stanullfrac,0)) = 0 THEN 2 ELSE 2 + (32 + 8 - 1) / 8 END AS index_tuple_hdr_bm, sum( (1-coalesce(s.stanullfrac, 0)) * coalesce(s.stawidth, 2048) ) AS nulldatawidth FROM pg_attribute AS a JOIN pg_statistic AS s ON s.starelid=a.attrelid AND s.staattnum = a.attnum JOIN ( SELECT nspname, relname, reltuples, relpages, indrelid, relam, regexp_split_to_table(indkey::text, ' ')::smallint AS attnum FROM pg_index JOIN pg_class ON pg_class.oid=pg_index.indexrelid JOIN pg_namespace ON pg_namespace.oid = pg_class.relnamespace AND pg_namespace.nspname NOT IN ('information_schema', 'pg_catalog') ) AS i ON i.indrelid = a.attrelid AND a.attnum = i.attnum WHERE a.attnum > 0 GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9 ) AS s1 ) AS s2 LEFT JOIN pg_am am ON s2.relam = am.oid ) as sub JOIN pg_class c ON c.oid=sub.table_oid ) a where (wastedbytes > 50 * 1024 * 1024 and realbloat > 50) or (wastedbytes > 1000 * 1024 * 1024 and realbloat > 15);"
pg_table_bloat:
query: "select * from ( select databasename,schemaname, tablename, wastedbytes, totalbytes, case when totalbytes = 0 then 0 else wastedbytes * 100 / totalbytes end as realbloat from (SELECT current_database() as databasename, schemaname, tablename, (pg_relation_size(schemaname || '.' || tablename)) as totalbytes, ROUND(CASE WHEN otta = 0 THEN 0.0 ELSE sml.relpages / otta::NUMERIC END, 1) AS tbloat, (CASE WHEN relpages < otta THEN 0 ELSE bs * (sml.relpages - otta)::BIGINT END) AS wastedbytes FROM (SELECT schemaname, tablename, cc.reltuples, cc.relpages, bs, CEIL((cc.reltuples * ((datahdr + ma - (CASE WHEN datahdr % ma = 0 THEN ma ELSE datahdr % ma END)) + nullhdr2 + 4)) / (bs - 20::FLOAT)) AS otta FROM (SELECT ma, bs, schemaname, tablename, (datawidth + (hdr + ma - (CASE WHEN hdr % ma = 0 THEN ma ELSE hdr % ma END)))::NUMERIC AS datahdr, (maxfracsum * (nullhdr + ma - (CASE WHEN nullhdr % ma = 0 THEN ma ELSE nullhdr % ma END))) AS nullhdr2 FROM (SELECT schemaname, tablename, hdr, ma, bs, SUM((1 - null_frac) * avg_width) AS datawidth, MAX(null_frac) AS maxfracsum, hdr + (SELECT 1 + COUNT(*) / 8 FROM pg_stats s2 WHERE null_frac <> 0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename) AS nullhdr FROM pg_stats s, (SELECT (SELECT current_setting('block_size')::NUMERIC) AS bs, CASE WHEN SUBSTRING(v, 12, 3) IN ('8.0', '8.1', '8.2') THEN 27 ELSE 23 END AS hdr, CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma FROM (SELECT version() AS v) AS foo) AS constants GROUP BY 1, 2, 3, 4, 5) AS foo) AS rs JOIN pg_class cc ON cc.relname = rs.tablename JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname NOT IN ('information_schema', 'pg_catalog')) AS sml ) a ) b where (wastedbytes > 50 * 1024 * 1024 and realbloat > 50) or (wastedbytes > 1000 * 1024 * 1024 and realbloat > 15);"
pg_transaction_age:
query: "SELECT c.oid::regclass as relation, ROUND(100*(age(c.relfrozenxid) / 2000000000::float)) as percent_towards_wraparound, ROUND(100*(age(c.relfrozenxid)/current_setting('autovacuum_freeze_max_age')::float)) as percent_towards_emergency_autovac FROM pg_class c JOIN pg_namespace n on c.relnamespace = n.oid WHERE relkind IN ('r', 't', 'm') AND n.nspname NOT IN ('pg_toast') ORDER BY 2 DESC LIMIT 10"
This is now released under quay.io/prometheuscommunity/postgres-exporter
.
I have upgraded the exporter.
@maudrid You said that you upgraded the exporter, but did that change anything about the problem you are describing?
Still having the same issue:
In the graph you can see where I upgraded to the latest version on the 27th.
@maudrid Can you try running the exporter without the custom queries file? That will help us understand if its one of those queries or something in the exporter itself.
I have emptied the queries.yml file and restarted the exporter. Will wait to see what it does now.
After removing the queries, the memory usage is stable. It seems to be caused by 1 or more of the custom queries.
We were noticing increased postgresql process memory usage for just specific queries as well. It turned out to only be happening when JIT was enabled. This defaults to "on" as of PG12, which is why it seemed to suddenly start happening for some and not others (when they'd upgrade to 12).
https://www.postgresql.org/docs/current/jit.html
We initially disabled it entirely for the postgresql instance and instantly stopped seeing memory usage increasing. We then just went and disabled it for the monitoring user so that others can use it if needed.
I was able to resolve this issue by adding jit=off
to my connection string.
I was able to resolve this issue by adding
jit=off
to my connection string.
This worked for me as well. I added this in the query string for postgres exporter. It solved the memory issue and the overall query latencies dropped substantially.