postgresql_cluster icon indicating copy to clipboard operation
postgresql_cluster copied to clipboard

getting LOG: could not fork autovacuum worker process: Cannot allocate memory

Open kumarashish071092 opened this issue 1 year ago • 2 comments

I have suddenly started getting OOM errors in my Postgres cluster . I have to restart the patroni to log in to the database.

If I can log in by killing any existing session, then I am getting this error :

postgres=# select count(1) from pg_stat_activity;
ERROR:  out of memory
LINE 1: select count(1) from pg_stat_activity;
                             ^
DETAIL:  Failed on request of size 25462 in memory context "MessageContext".

postgresql_parameters:

  • { option: "max_connections", value: "500" }
  • { option: "superuser_reserved_connections", value: "5" }
  • { option: "password_encryption", value: "{{ postgresql_password_encryption_algorithm }}" }
  • { option: "max_locks_per_transaction", value: "512" }
  • { option: "max_prepared_transactions", value: "0" }
  • { option: "huge_pages", value: "try" } # or "on" if you set "vm_nr_hugepages" in kernel parameters
  • { option: "shared_buffers", value: "{{ (ansible_memtotal_mb * 0.25) | int }}MB" } # by default, 25% of RAM
  • { option: "effective_cache_size", value: "{{ (ansible_memtotal_mb * 0.75) | int }}MB" } # by default, 75% of RAM
  • { option: "work_mem", value: "128MB" } # please change this value
  • { option: "maintenance_work_mem", value: "256MB" } # please change this value
  • { option: "checkpoint_timeout", value: "15min" }
  • { option: "checkpoint_completion_target", value: "0.9" }
  • { option: "min_wal_size", value: "1GB" }
  • { option: "max_wal_size", value: "2GB" } # or 16GB/32GB
  • { option: "wal_buffers", value: "32MB" }
  • { option: "default_statistics_target", value: "1000" }
  • { option: "seq_page_cost", value: "1" }
  • { option: "random_page_cost", value: "4" } # "1.1" for SSD storage. Also, if your databases fits in shared_buffers
  • { option: "effective_io_concurrency", value: "2" } # "200" for SSD storage
  • { option: "synchronous_commit", value: "on" } # or 'off' if you can you lose single transactions in case of a crash
  • { option: "autovacuum", value: "on" } # never turn off the autovacuum!
  • { option: "autovacuum_max_workers", value: "5" }
  • { option: "autovacuum_vacuum_scale_factor", value: "0.02" } # or 0.005/0.001
  • { option: "autovacuum_analyze_scale_factor", value: "0.01" }
  • { option: "autovacuum_vacuum_cost_limit", value: "500" } # or 1000/5000
  • { option: "autovacuum_vacuum_cost_delay", value: "2" }
  • { option: "autovacuum_naptime", value: "1s" }
  • { option: "max_files_per_process", value: "4096" }
  • { option: "archive_mode", value: "on" }
  • { option: "archive_timeout", value: "1800s" }
  • { option: "archive_command", value: "cd ." }
    • { option: "wal_level", value: "replica" }
  • { option: "wal_keep_size", value: "2GB" }
  • { option: "max_wal_senders", value: "10" }
  • { option: "max_replication_slots", value: "10" }
  • { option: "hot_standby", value: "on" }
  • { option: "wal_log_hints", value: "on" }
  • { option: "wal_compression", value: "on" }
  • { option: "shared_preload_libraries", value: "pg_stat_statements,auto_explain" }
  • { option: "pg_stat_statements.max", value: "10000" }
  • { option: "pg_stat_statements.track", value: "all" }
  • { option: "pg_stat_statements.track_utility", value: "false" }
  • { option: "pg_stat_statements.save", value: "true" }
  • { option: "auto_explain.log_min_duration", value: "10s" } # enable auto_explain for 10-second logging threshold. Decrease this value if necessary
  • { option: "auto_explain.log_analyze", value: "true" }
  • { option: "auto_explain.log_buffers", value: "true" }
  • { option: "auto_explain.log_timing", value: "false" }
  • { option: "auto_explain.log_triggers", value: "true" }
  • { option: "auto_explain.log_verbose", value: "true" }
  • { option: "auto_explain.log_nested_statements", value: "true" }
  • { option: "auto_explain.sample_rate", value: "0.01" } # enable auto_explain for 1% of queries logging threshold
  • { option: "track_io_timing", value: "on" }
  • { option: "log_lock_waits", value: "on" }
  • { option: "log_temp_files", value: "0" }
  • { option: "listen_addresses", value: "0.0.0.0" }
  • { option: "track_activities", value: "on" }
  • { option: "track_activity_query_size", value: "4096" }
  • { option: "track_counts", value: "on" }
  • { option: "track_functions", value: "all" }
  • { option: "log_checkpoints", value: "on" }
  • { option: "log_connections", value: "on" }
  • { option: "log_disconnections", value: "on" }
  • { option: "log_autovacuum_min_duration", value: "0" }
    • { option: "log_error_verbosity", value: "default" }
  • { option: "log_statement", value: "off" }
  • { option: "log_min_duration_statement", value: "10" }
  • { option: "logging_collector", value: "on" }
  • { option: "log_truncate_on_rotation", value: "on" }
  • { option: "log_rotation_age", value: "1d" }
  • { option: "log_rotation_size", value: "0" }
  • { option: "log_line_prefix", value: "'%t [%p]: db=%d,user=%u,app=%a,client=%h '" }
  • { option: "log_filename", value: "postgresql-%a.log" }
  • { option: "log_directory", value: "{{ postgresql_log_dir }}" }
  • { option: "hot_standby_feedback", value: "on" } # allows feedback from a hot standby to the primary that will avoid query conflicts
  • { option: "max_standby_streaming_delay", value: "30s" }
  • { option: "wal_receiver_status_interval", value: "10s" }
  • { option: "idle_in_transaction_session_timeout", value: "10min" } # reduce this timeout if possible
  • { option: "jit", value: "off" }
  • { option: "max_worker_processes", value: "24" }
  • { option: "max_parallel_workers", value: "8" }
  • { option: "max_parallel_workers_per_gather", value: "2" }
  • { option: "max_parallel_maintenance_workers", value: "2" }
  • { option: "tcp_keepalives_count", value: "10" }
  • { option: "tcp_keepalives_idle", value: "300" }
  • { option: "tcp_keepalives_interval", value: "30" }

I tried reducing the work_mem to 64MB and Shared buffer to 1400 MB still it is coming. What approach should I take?

kumarashish071092 avatar Jun 05 '24 08:06 kumarashish071092

postgres version : 15 OS: Ubuntu 22.04 RAM : 8 GB Core : 2 Swap : 2 GB pgbouncer : enabled

it was working fine earlier . Suddenly this issue has appeared.

kumarashish071092 avatar Jun 05 '24 09:06 kumarashish071092

Hi @kumarashish071092

This issue is not related to cluster deployment but to maintenance issues. Consider sponsorship, some of the subscriptions include individual support.

Where to start:

  1. Check the monitoring system to determine the reason for the increased memory consumption. For example: more connections or a change in workload. Which processes consume memory.
  2. If query optimization is not considered, then try to reduce memory usage (shared_buffers, work_mem, maintenance_work_mem) or increase the resources of the database server.
  3. Monitor database performance metrics when changing parameters. If the memory usage parameters are reduced, performance degradation is possible.

vitabaks avatar Jun 05 '24 21:06 vitabaks