postgresql_cluster getting LOG: could not fork autovacuum worker process: Cannot allocate memory

I have suddenly started getting OOM errors in my Postgres cluster . I have to restart the patroni to log in to the database.

If I can log in by killing any existing session, then I am getting this error :

postgres=# select count(1) from pg_stat_activity;
ERROR:  out of memory
LINE 1: select count(1) from pg_stat_activity;
                             ^
DETAIL:  Failed on request of size 25462 in memory context "MessageContext".

postgresql_parameters:

{ option: "max_connections", value: "500" }
{ option: "superuser_reserved_connections", value: "5" }
{ option: "password_encryption", value: "{{ postgresql_password_encryption_algorithm }}" }
{ option: "max_locks_per_transaction", value: "512" }
{ option: "max_prepared_transactions", value: "0" }
{ option: "huge_pages", value: "try" } # or "on" if you set "vm_nr_hugepages" in kernel parameters
{ option: "shared_buffers", value: "{{ (ansible_memtotal_mb * 0.25) | int }}MB" } # by default, 25% of RAM
{ option: "effective_cache_size", value: "{{ (ansible_memtotal_mb * 0.75) | int }}MB" } # by default, 75% of RAM
{ option: "work_mem", value: "128MB" } # please change this value
{ option: "maintenance_work_mem", value: "256MB" } # please change this value
{ option: "checkpoint_timeout", value: "15min" }
{ option: "checkpoint_completion_target", value: "0.9" }
{ option: "min_wal_size", value: "1GB" }
{ option: "max_wal_size", value: "2GB" } # or 16GB/32GB
{ option: "wal_buffers", value: "32MB" }
{ option: "default_statistics_target", value: "1000" }
{ option: "seq_page_cost", value: "1" }
{ option: "random_page_cost", value: "4" } # "1.1" for SSD storage. Also, if your databases fits in shared_buffers
{ option: "effective_io_concurrency", value: "2" } # "200" for SSD storage
{ option: "synchronous_commit", value: "on" } # or 'off' if you can you lose single transactions in case of a crash
{ option: "autovacuum", value: "on" } # never turn off the autovacuum!
{ option: "autovacuum_max_workers", value: "5" }
{ option: "autovacuum_vacuum_scale_factor", value: "0.02" } # or 0.005/0.001
{ option: "autovacuum_analyze_scale_factor", value: "0.01" }
{ option: "autovacuum_vacuum_cost_limit", value: "500" } # or 1000/5000
{ option: "autovacuum_vacuum_cost_delay", value: "2" }
{ option: "autovacuum_naptime", value: "1s" }
{ option: "max_files_per_process", value: "4096" }
{ option: "archive_mode", value: "on" }
{ option: "archive_timeout", value: "1800s" }
{ option: "archive_command", value: "cd ." }
- { option: "wal_level", value: "replica" }
{ option: "wal_keep_size", value: "2GB" }
{ option: "max_wal_senders", value: "10" }
{ option: "max_replication_slots", value: "10" }
{ option: "hot_standby", value: "on" }
{ option: "wal_log_hints", value: "on" }
{ option: "wal_compression", value: "on" }
{ option: "shared_preload_libraries", value: "pg_stat_statements,auto_explain" }
{ option: "pg_stat_statements.max", value: "10000" }
{ option: "pg_stat_statements.track", value: "all" }
{ option: "pg_stat_statements.track_utility", value: "false" }
{ option: "pg_stat_statements.save", value: "true" }
{ option: "auto_explain.log_min_duration", value: "10s" } # enable auto_explain for 10-second logging threshold. Decrease this value if necessary
{ option: "auto_explain.log_analyze", value: "true" }
{ option: "auto_explain.log_buffers", value: "true" }
{ option: "auto_explain.log_timing", value: "false" }
{ option: "auto_explain.log_triggers", value: "true" }
{ option: "auto_explain.log_verbose", value: "true" }
{ option: "auto_explain.log_nested_statements", value: "true" }
{ option: "auto_explain.sample_rate", value: "0.01" } # enable auto_explain for 1% of queries logging threshold
{ option: "track_io_timing", value: "on" }
{ option: "log_lock_waits", value: "on" }
{ option: "log_temp_files", value: "0" }
{ option: "listen_addresses", value: "0.0.0.0" }
{ option: "track_activities", value: "on" }
{ option: "track_activity_query_size", value: "4096" }
{ option: "track_counts", value: "on" }
{ option: "track_functions", value: "all" }
{ option: "log_checkpoints", value: "on" }
{ option: "log_connections", value: "on" }
{ option: "log_disconnections", value: "on" }
{ option: "log_autovacuum_min_duration", value: "0" }
- { option: "log_error_verbosity", value: "default" }
{ option: "log_statement", value: "off" }
{ option: "log_min_duration_statement", value: "10" }
{ option: "logging_collector", value: "on" }
{ option: "log_truncate_on_rotation", value: "on" }
{ option: "log_rotation_age", value: "1d" }
{ option: "log_rotation_size", value: "0" }
{ option: "log_line_prefix", value: "'%t [%p]: db=%d,user=%u,app=%a,client=%h '" }
{ option: "log_filename", value: "postgresql-%a.log" }
{ option: "log_directory", value: "{{ postgresql_log_dir }}" }
{ option: "hot_standby_feedback", value: "on" } # allows feedback from a hot standby to the primary that will avoid query conflicts
{ option: "max_standby_streaming_delay", value: "30s" }
{ option: "wal_receiver_status_interval", value: "10s" }
{ option: "idle_in_transaction_session_timeout", value: "10min" } # reduce this timeout if possible
{ option: "jit", value: "off" }
{ option: "max_worker_processes", value: "24" }
{ option: "max_parallel_workers", value: "8" }
{ option: "max_parallel_workers_per_gather", value: "2" }
{ option: "max_parallel_maintenance_workers", value: "2" }
{ option: "tcp_keepalives_count", value: "10" }
{ option: "tcp_keepalives_idle", value: "300" }
{ option: "tcp_keepalives_interval", value: "30" }

I tried reducing the work_mem to 64MB and Shared buffer to 1400 MB still it is coming. What approach should I take?

Jun 05 '24 08:06 kumarashish071092

postgres version : 15 OS: Ubuntu 22.04 RAM : 8 GB Core : 2 Swap : 2 GB pgbouncer : enabled

it was working fine earlier . Suddenly this issue has appeared.

Jun 05 '24 09:06 kumarashish071092

Hi @kumarashish071092

This issue is not related to cluster deployment but to maintenance issues. Consider sponsorship, some of the subscriptions include individual support.

Where to start:

Check the monitoring system to determine the reason for the increased memory consumption. For example: more connections or a change in workload. Which processes consume memory.
If query optimization is not considered, then try to reduce memory usage (shared_buffers, work_mem, maintenance_work_mem) or increase the resources of the database server.
Monitor database performance metrics when changing parameters. If the memory usage parameters are reduced, performance degradation is possible.

Jun 05 '24 21:06 vitabaks

postgresql_cluster postgresql_cluster copied to clipboard

getting LOG: could not fork autovacuum worker process: Cannot allocate memory

postgresql_cluster
postgresql_cluster copied to clipboard