memory usage: use of 100% of available RAM
What did you do?
We have a postgres-exporter as a service in a Docker Swarm stack
What did you expect to see?
We do not expect container to always reach 100% of RAM in few days
What did you see instead? Under which circumstances?
With prometheus query
(container_memory_working_set_bytes{image!="", container_label_com_docker_swarm_service_name="opencity-aspsr_postgres-exporter"} / (container_spec_memory_limit_bytes{image!=""} > 0) * 100)
we see a steady upwards slope in memory usage: postgres-exporter is gradually consuming all memory over time.
Environment
We have a Docker Swarm service with image: quay.io/prometheuscommunity/postgres-exporter:v0.17.1
-
postgres_exporter version:
v0.17.1
-
postgres_exporter flags:
none
-
PostgreSQL version:
image: 'postgres:14.17-alpine'
Can post this with container_memory_rss? Working set is a misleading metric as it includes cache.
Ok, I repeat here the same query above
while here I replace container_memory_working_set_bytes with container_memory_rss
I think this might be a good time to add pprof like we did in the elasticsearch_exporter to help us uncover what is using the memory. What do you think @SuperQ? I'll see if I can get around to testing it but should be a very easy change.
Oh, huge doubt
Can it be that is my faulty configuration?
This is service:
postgres-exporter:
image: quay.io/prometheuscommunity/postgres-exporter:v0.17.1
environment:
DATA_SOURCE_NAME: "postgresql://postgres:5432/postgres?sslmode=disable"
DATA_SOURCE_USER: myuser
DATA_SOURCE_PASS: xxxxxxxxxxxx
PG_EXPORTER_AUTO_DISCOVER_DATABASES: 'true'
Where myuser has only privileges on a single DB (is not the superuser)
"GRANT ALL PRIVILEGES ON DATABASE mydatabase TO myuser;
Yes, pprof would be very useful here.
I've checked: the user myuser we are using in the production service from which we've extracted the above data/graphs so far is not postgres, but it has Superuser role anyhow:
# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------------------+-----------
myuser | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
so this is still a valid issue for us
If you can run the master branch, that enables pprof which would allow you to pull data at runtime about where the memory is being used. I have posted some quick instructions on the elasticsearch_exporter that can help track this down https://github.com/prometheus-community/elasticsearch_exporter/issues/851#issuecomment-2856838485
A slightly easier way:
curl -o postgres_exporter-heap.pprf "http://localhost:9187/debug/pprof/heap"
Then upload the file to https://pprof.me.
We have the same issue, running timescale/timescaledb-ha:pg17.5-ts2.21.3.
Once the exporter is started it goes out of memory in a few seconds.
Interesting, it looks like there is a leak in the stat_user_tables collector. Is it possible you could try with --no-collector.stat_user_tables?
I tested with the following (in docker compose):
command: ['--collector.stat_statements', '--no-collector.stat_user_tables']
It still seems to happen: pprof
The user has the 'pg_monitor' role. This is however with timescale, so there are lot of internal tables for the chunks/compression.