yugabyte-db
yugabyte-db copied to clipboard
[DocDB] Postgress Backends are terminated even when total memory usage was half of total available memory on the node.
Jira Link: DB-2731
Description
As a part of testing changes [DB-2687 Make OOM killer prioritize PG backends with more memory], I observed that all the postgres backends got terminated even when the total memory usage was half of total available memory.
Test and instance details:
- Cluster: 3 node/C5.Large instance
- Sysbench oltp_read_only workload with 300 tables/25k rows per table with 30 threads running in parallel.
The total available memory on the node is 3.5G and the actual memory usage was only 1.6G(RSS). The issue here is when we have enough available free memory then why the postgres backend was terminated by signal 9 .
Postgres logs:
2022-06-22 18:20:59.883 UTC [18282] LOG: server process (PID 7916) was terminated by signal 9: Killed 2022-06-22 18:20:59.885 UTC [18282] LOG: terminating any other active server processes 2022-06-22 18:20:59.926 UTC [7913] WARNING: terminating connection because of crash of another server process 2022-06-22 18:20:59.926 UTC [7913] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2022-06-22 18:21:00.687 UTC [18282] LOG: all server processes terminated; reinitializing 2022-06-22 18:21:01.208 UTC [8982] LOG: database system was interrupted; last known up at 2022-06-22 17:18:34 UTC 2022-06-22 18:21:01.325 UTC [8982] LOG: database system was not properly shut down; automatic recovery in progress 2022-06-22 18:21:01.337 UTC [8982] LOG: redo starts at 0/1000108 2022-06-22 18:21:01.337 UTC [8982] LOG: invalid record length at 0/10001B0: wanted 24, got 0 2022-06-22 18:21:01.337 UTC [8982] LOG: redo done at 0/1000140
Dmesg log: [Wed Jun 22 18:21:00 2022] Killed process 7916 (postgres), UID 997, total-vm:650712kB, anon-rss:299340kB, file-rss:0kB, shmem-rss:52kB

Below is the excerpt from the dmesg log about the OOM killer message dump
OOM score of postgres/tserver/yb-master processes
Based on the OOM killer prints:
- RSS is much higher for Postgres backends.
Does top command output captured before the OOM happened ? @shantanugupta-yb