postgres_exporter icon indicating copy to clipboard operation
postgres_exporter copied to clipboard

Huge memory usage on postgres process from exporter connection.

Open maudrid opened this issue 2 years ago • 10 comments

What did you do? Use postgres exporter via docker. What did you expect to see? Stable memory usage on the Postgres side. What did you see instead? Under which circumstances? Gradual increase in memory usage on postgres for the PID that relates to the connection from postgres_exporter. Environment Dockerized postgres, dockerized eporter running on Ubuntu VM

  • System information: Linux 4.15.0-180-generic x86_64

  • postgres_exporter version:

    postgres_exporter v0.8.0-21-g8531aba (built with go1.11)

  • postgres_exporter flags:

- DATA_SOURCE_NAME=postgresql://$DB_USERNAME:$DB_PASSWORD@postgres:5432/database?sslmode=disable
- PG_EXPORTER_EXTEND_QUERY_PATH=/queries.yml
  • PostgreSQL version:

    PostgreSQL 14.4 (Ubuntu 14.4-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, 64-bit

  • Logs:

No Logs
  • Top output on postgres server:
top - 06:56:21 up 117 days, 14:43,  0 users,  load average: 0.34, 0.31, 0.27
Tasks:  22 total,   1 running,  21 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.4 us,  0.1 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.1 st
MiB Mem :  31623.3 total,    452.7 free,  24605.8 used,   6564.8 buff/cache
MiB Swap:   3913.0 total,   3788.5 free,    124.5 used.   5609.7 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                                                                                                                 
 84191 postgres  20   0 9760792  56332  52708 S   0.3   0.2   0:52.71 postgres                                                                                                                                                                                                                                                
     1 postgres  20   0 9742468 220728 219976 S   0.0   0.7   7:24.14 postgres                                                                                                                                                                                                                                                
    14 postgres  20   0 9742620 187232 186332 S   0.0   0.6   1:02.18 postgres                                                                                                                                                                                                                                                
    15 postgres  20   0 9742572  90528  89772 S   0.0   0.3   0:51.94 postgres                                                                                                                                                                                                                                                
    16 postgres  20   0 9742468  22204  21520 S   0.0   0.1   1:11.24 postgres                                                                                                                                                                                                                                                
    17 postgres  20   0 9743144   7672   6860 S   0.0   0.0   1:45.70 postgres                                                                                                                                                                                                                                                
    18 postgres  20   0   75304   6504   4480 S   0.0   0.0  48:42.36 postgres                                                                                                                                                                                                                                                
    19 postgres  20   0 9742896   5612   4912 S   0.0   0.0   0:03.55 postgres                                                                                                                                                                                                                                                
    45 postgres  20   0   26.8g  17.5g 115200 S   0.0  56.5   3410:00 postgres                                                                                                                                                                                                                                                
 84190 postgres  20   0 9766768  63028  58224 S   0.0   0.2   0:53.69 postgres                                                                                                                                                                                                                                                
 84869 root      20   0    8020   4108   3500 S   0.0   0.0   0:00.09 bash                                                                                                                                                                                                                                                    
 85342 postgres  20   0 9758948  70932  65392 S   0.0   0.2   0:00.13 postgres                                                                                                                                                                                                                                                
 85383 postgres  20   0 9758868  66704  61804 S   0.0   0.2   0:00.08 postgres                                                                                                                                                                                                                                                
 85425 postgres  20   0 9757864  65960  61712 S   0.0   0.2   0:00.11 postgres                                                                                                                                                                                                                                                
 85466 postgres  20   0 9758040  65172  60960 S   0.0   0.2   0:00.11 postgres                                                                                                                                                                                                                                                
 85507 postgres  20   0 9758272  69368  64708 S   0.0   0.2   0:00.14 postgres                                                                                                                                                                                                                                                
 85550 postgres  20   0 9757808  62656  58756 S   0.0   0.2   0:00.06 postgres                                                                                                                                                                                                                                                
 85591 postgres  20   0 9756636  47044  43984 S   0.0   0.1   0:00.04 postgres                                                                                                                                                                                                                                                
 85632 postgres  20   0 9759272  63788  58728 S   0.0   0.2   0:00.06 postgres                                                                                                                                                                                                                                                
 85673 postgres  20   0 9744404  34332  31904 S   0.0   0.1   0:00.05 postgres                                                                                                                                                                                                                                                
 85716 postgres  20   0 9743600  28160  26084 S   0.0   0.1   0:00.01 postgres                                                                                                                                                                                                                                                
 85738 root      20   0   10904   3868   3160 R   0.0   0.0   0:00.00 top            

PID 45 is the exporter.

pg_stat_activity for PID 45:

datid            | 16401
datname          | affectli
pid              | 45
leader_pid       | 
usesysid         | 10
usename          | postgres
application_name | 
client_addr      | 172.19.0.22
client_hostname  | 
client_port      | 53706
backend_start    | 2022-08-19 13:32:27.893675+00
xact_start       | 
query_start      | 2022-09-26 07:10:28.67894+00
state_change     | 2022-09-26 07:10:29.088342+00
wait_event_type  | Client
wait_event       | ClientRead
state            | idle
backend_xid      | 
backend_xmin     | 
query_id         | -4060127036400457215
query            | select * from (SELECT current_database() as databasename, nspname as schemaname, c.relname AS tablename, index_name as indexname, bs*(sub.relpages)::bigint AS totalbytes,  CASE WHEN sub.relpages <= otta THEN 0 ELSE bs*(sub.relpages-otta)::bigint END AS wastedbytes,  CASE WHEN sub.relpages <= otta THEN 0 ELSE bs*(sub.relpages-otta)::bigint * 100 / (bs*(sub.relpages)::bigint) END AS realbloat FROM (  SELECT bs, nspname, table_oid, index_name, relpages, coalesce(    ceil((reltuples*(4+nulldatahdrwidth))/(bs-pagehdr::float)) + CASE WHEN am.amname IN ('hash','btree') THEN 1 ELSE 0 END , 0) AS otta  FROM (    SELECT maxalign, bs, nspname, relname AS index_name, reltuples, relpages, relam, table_oid, ( index_tuple_hdr_bm + maxalign - CASE WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign ELSE index_tuple_hdr_bm % maxalign END + nulldatawidth + maxalign - CASE WHEN nulldatawidth::integer%maxalign = 0 THEN maxalign ELSE nulldatawidth::integer%maxalign END )::numeric AS nulldatahdrwidth, pagehdr FROM ( SELECT i.nspna
backend_type     | client backend

Not sure if this will help at all but here is some output from /proc/45/smamps:

564094733000-564409b66000 rw-p 00000000 00:00 0                          [heap]
Size:           14504140 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:            14490480 kB
Pss:            14490480 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         4 kB
Private_Dirty:  14490476 kB
Referenced:     14463028 kB
Anonymous:      14490480 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:              12264 kB
SwapPss:           12264 kB
Locked:                0 kB
VmFlags: rd wr mr mw me ac sd 
7fb7569d4000-7fb7b69d5000 rw-p 00000000 00:00 0 
Size:            1572868 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:             1572868 kB
Pss:             1572868 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:   1572868 kB
Referenced:      1572448 kB
Anonymous:       1572868 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
VmFlags: rd wr mr mw me ac sd 
7fb8042e4000-7fb8842e5000 rw-p 00000000 00:00 0 
Size:            2097156 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:             2097156 kB
Pss:             2097156 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:   2097156 kB
Referenced:      2096928 kB
Anonymous:       2097156 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
VmFlags: rd wr mr mw me ac sd 

maudrid avatar Sep 26 '22 07:09 maudrid

~~The postgres_exporter includes pprof, so you can get a direct dump of memory use.~~

Sorry, I misread. You're talking about postgres memory use, not the exporter.

First, please update your exporter, the version you are using is very old. Second, what queries have you configured in the queries.yaml?

SuperQ avatar Sep 26 '22 08:09 SuperQ

About updating the exporter, is there a newer docker image I can use? This is the latest one I can find here: https://hub.docker.com/r/wrouesnel/postgres_exporter/tags

Here are the queries from the .yml file (metrics removed for brevity)

pg_replication:
  query: "SELECT case when pg_is_in_recovery() then EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) else 0 end as lag"
  
pg_postmaster:
  query: "SELECT pg_postmaster_start_time as start_time_seconds from pg_postmaster_start_time()"
  
pg_stat_user_tables:
  query: "SELECT schemaname, relname, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd, n_live_tup, n_dead_tup, n_mod_since_analyze, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze, vacuum_count, autovacuum_count, analyze_count, autoanalyze_count FROM pg_stat_user_tables"
  
pg_database:
  query: " SELECT pg_database.datname, pg_database_size(pg_database.datname) as size FROM pg_database"
  
pg_table_size:
  query: "SELECT (table_schema||'.'||table_name) as table_name, row_estimate, total_bytes, index_bytes, toast_bytes, table_bytes FROM (SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME, c.reltuples AS row_estimate, pg_total_relation_size(c.oid) AS total_bytes, pg_indexes_size(c.oid) AS index_bytes, pg_total_relation_size(reltoastrelid) AS toast_bytes FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace WHERE relkind = 'r' and pg_total_relation_size(c.oid) > 100000000) a) a"
  
pg_stat_statements:
  query: "select dbid, queryid, regexp_replace(substr(query,1,256), E'[\n\r]+', ' ', 'g' ) as query, calls, total_exec_time, min_exec_time, max_exec_time, mean_exec_time, stddev_exec_time, rows from pg_stat_statements where calls > 100 and dbid = (SELECT datid FROM pg_stat_database WHERE datname = current_database()) ORDER BY total_exec_time desc limit 20"
  
pg_stat_activity_detail:
  query: "select datname, case when substr(application_name,1,7) = 'pgAdmin' then 'pgAdmin' when application_name = '' then 'undefined' else application_name end as application_name, usename, client_addr, state, count(*) as count from pg_stat_activity group  by 1,2,3,4,5"
  
pg_stat_activity_tx_detail:
  query: "select datname, case when substr(application_name,1,7) = 'pgAdmin' then 'pgAdmin' when application_name = '' then 'undefined' else application_name end as application_name, usename, client_addr, state, MAX(EXTRACT(EPOCH FROM now() - xact_start))::float as duration from pg_stat_activity where xact_start is not null group  by 1,2,3,4,5"
  
pg_exclusive_lock_details:
  query: "select l.relation::regclass as relation, a.application_name, regexp_replace(substr(a.query,1,256), E'[\\n\\r]+', ' ', 'g' ) as query, granted::text as granted, count(*) as count FROM pg_locks l JOIN pg_stat_activity a ON a.pid = l.pid where a.datname = 'affectli' and mode = 'ExclusiveLock' and locktype = 'tuple' group by 1,2,3,4"
  
pg_index_bloat:
  query: "select * from (SELECT current_database() as databasename, nspname as schemaname, c.relname AS tablename, index_name as indexname, bs*(sub.relpages)::bigint AS totalbytes,  CASE WHEN sub.relpages <= otta THEN 0 ELSE bs*(sub.relpages-otta)::bigint END AS wastedbytes,  CASE WHEN sub.relpages <= otta THEN 0 ELSE bs*(sub.relpages-otta)::bigint * 100 / (bs*(sub.relpages)::bigint) END AS realbloat FROM (  SELECT bs, nspname, table_oid, index_name, relpages, coalesce(    ceil((reltuples*(4+nulldatahdrwidth))/(bs-pagehdr::float)) + CASE WHEN am.amname IN ('hash','btree') THEN 1 ELSE 0 END , 0) AS otta  FROM (    SELECT maxalign, bs, nspname, relname AS index_name, reltuples, relpages, relam, table_oid, ( index_tuple_hdr_bm + maxalign - CASE WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign ELSE index_tuple_hdr_bm % maxalign END + nulldatawidth + maxalign - CASE WHEN nulldatawidth::integer%maxalign = 0 THEN maxalign ELSE nulldatawidth::integer%maxalign END )::numeric AS nulldatahdrwidth, pagehdr FROM ( SELECT i.nspname, i.relname, i.reltuples, i.relpages, i.relam, s.starelid, a.attrelid AS table_oid, current_setting('block_size')::numeric AS bs, CASE WHEN version() ~ 'mingw32' OR version() ~ '64-bit' THEN 8 ELSE 4 END AS maxalign, CASE WHEN substring(current_setting('server_version') FROM '#\"[0-9]+#\"%' FOR '#')::integer > 7 THEN 24 ELSE 20 END AS pagehdr, CASE WHEN max(coalesce(s.stanullfrac,0)) = 0 THEN 2 ELSE 2 + (32 + 8 - 1) / 8 END AS index_tuple_hdr_bm, sum( (1-coalesce(s.stanullfrac, 0)) * coalesce(s.stawidth, 2048) ) AS nulldatawidth FROM pg_attribute AS a JOIN pg_statistic AS s ON s.starelid=a.attrelid AND s.staattnum = a.attnum JOIN ( SELECT nspname, relname, reltuples, relpages, indrelid, relam, regexp_split_to_table(indkey::text, ' ')::smallint AS attnum FROM pg_index JOIN pg_class ON pg_class.oid=pg_index.indexrelid JOIN pg_namespace ON pg_namespace.oid = pg_class.relnamespace AND pg_namespace.nspname NOT IN ('information_schema', 'pg_catalog') ) AS i ON i.indrelid = a.attrelid AND a.attnum = i.attnum WHERE a.attnum > 0 GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9 ) AS s1 ) AS s2 LEFT JOIN pg_am am ON s2.relam = am.oid ) as sub JOIN pg_class c ON c.oid=sub.table_oid ) a where (wastedbytes > 50 * 1024 * 1024 and realbloat > 50) or (wastedbytes > 1000 * 1024 * 1024 and realbloat > 15);"
  
pg_table_bloat:
  query: "select * from ( select databasename,schemaname, tablename, wastedbytes, totalbytes, case when totalbytes = 0 then 0 else wastedbytes * 100 / totalbytes end as realbloat from (SELECT current_database() as databasename, schemaname, tablename, (pg_relation_size(schemaname || '.' || tablename)) as totalbytes, ROUND(CASE WHEN otta = 0 THEN 0.0 ELSE sml.relpages / otta::NUMERIC END, 1) AS tbloat, (CASE WHEN relpages < otta THEN 0 ELSE bs * (sml.relpages - otta)::BIGINT END) AS wastedbytes FROM (SELECT schemaname, tablename, cc.reltuples, cc.relpages, bs, CEIL((cc.reltuples * ((datahdr + ma - (CASE WHEN datahdr % ma = 0 THEN ma ELSE datahdr % ma END)) + nullhdr2 + 4)) / (bs - 20::FLOAT)) AS otta FROM (SELECT ma, bs, schemaname, tablename, (datawidth + (hdr + ma - (CASE WHEN hdr % ma = 0 THEN ma ELSE hdr % ma END)))::NUMERIC AS datahdr, (maxfracsum * (nullhdr + ma - (CASE WHEN nullhdr % ma = 0 THEN ma ELSE nullhdr % ma END))) AS nullhdr2 FROM (SELECT schemaname, tablename, hdr, ma, bs, SUM((1 - null_frac) * avg_width) AS datawidth, MAX(null_frac) AS maxfracsum, hdr + (SELECT 1 + COUNT(*) / 8 FROM pg_stats s2 WHERE null_frac <> 0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename) AS nullhdr FROM pg_stats s, (SELECT (SELECT current_setting('block_size')::NUMERIC) AS bs, CASE WHEN SUBSTRING(v, 12, 3) IN ('8.0', '8.1', '8.2') THEN 27 ELSE 23 END AS hdr, CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma FROM (SELECT version() AS v) AS foo) AS constants GROUP BY 1, 2, 3, 4, 5) AS foo) AS rs JOIN pg_class cc ON cc.relname = rs.tablename JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname NOT IN ('information_schema', 'pg_catalog')) AS sml ) a ) b where (wastedbytes > 50 * 1024 * 1024 and realbloat > 50) or (wastedbytes > 1000 * 1024 * 1024 and realbloat > 15);"
  
pg_transaction_age:
  query: "SELECT c.oid::regclass as relation, ROUND(100*(age(c.relfrozenxid) / 2000000000::float)) as percent_towards_wraparound, ROUND(100*(age(c.relfrozenxid)/current_setting('autovacuum_freeze_max_age')::float)) as percent_towards_emergency_autovac FROM pg_class c JOIN pg_namespace n on c.relnamespace = n.oid WHERE relkind IN ('r', 't', 'm') AND n.nspname NOT IN ('pg_toast') ORDER BY 2 DESC LIMIT 10"

maudrid avatar Sep 26 '22 09:09 maudrid

This is now released under quay.io/prometheuscommunity/postgres-exporter.

SuperQ avatar Sep 26 '22 19:09 SuperQ

I have upgraded the exporter.

maudrid avatar Sep 27 '22 06:09 maudrid

@maudrid You said that you upgraded the exporter, but did that change anything about the problem you are describing?

sysadmind avatar Oct 04 '22 00:10 sysadmind

Still having the same issue: image

image

In the graph you can see where I upgraded to the latest version on the 27th.

maudrid avatar Oct 04 '22 11:10 maudrid

@maudrid Can you try running the exporter without the custom queries file? That will help us understand if its one of those queries or something in the exporter itself.

sysadmind avatar Oct 04 '22 14:10 sysadmind

I have emptied the queries.yml file and restarted the exporter. Will wait to see what it does now.

maudrid avatar Oct 04 '22 15:10 maudrid

After removing the queries, the memory usage is stable. It seems to be caused by 1 or more of the custom queries.

maudrid avatar Oct 10 '22 11:10 maudrid

We were noticing increased postgresql process memory usage for just specific queries as well. It turned out to only be happening when JIT was enabled. This defaults to "on" as of PG12, which is why it seemed to suddenly start happening for some and not others (when they'd upgrade to 12).

https://www.postgresql.org/docs/current/jit.html

We initially disabled it entirely for the postgresql instance and instantly stopped seeing memory usage increasing. We then just went and disabled it for the monitoring user so that others can use it if needed.

keithf4 avatar Oct 10 '22 13:10 keithf4

I was able to resolve this issue by adding jit=off to my connection string.

maudrid avatar Oct 28 '22 12:10 maudrid

I was able to resolve this issue by adding jit=off to my connection string.

This worked for me as well. I added this in the query string for postgres exporter. It solved the memory issue and the overall query latencies dropped substantially.

saqib-ahmed avatar Apr 18 '23 10:04 saqib-ahmed