yugabyte-db
yugabyte-db copied to clipboard
[YSQL] YSQL connection memory usage increased to 300MB+ on 2.15 compared to 2.12.
Jira Link: DB-2834
Description
On 2.15, the postgres backend size has increased to 300MB+(Select). For the same workload on 2.12 the postgres backend usage was 67MB max.
This increase in only seen for the 1st connections created post data loading but many concurrent connections are created at the start it results in OOM.
Below are the test details:
- sysbench read workload
- 6 connections
- 300 tables
- 25k rows/table
Postgres backend process RSS size on each node for 2.12 and 2.15 2.12.1.0 build 13: Node 1: 66.8M / 67.0M Node 2: 44.1M / 44.2M Node 3: 64.4M / 64.7M 2.15.0.0 build 1: Node 1: 342.2M / 342.2M Node 2: 35.2M / 35.2M Node 3: 333.6M / 333.9M
The increase in PG backend memory allocation on 2.15.0.0 build 1 is due to increased allocations on heap: Node 1: Total allocation: 347172 K heap allocation: 332956 k other allocations: 14216 k Node 2 Total allocation: 32232 K heap allocation: 11592 k other allocations: 20640 k
Heap allocation for each process Node 1:
010af000-01bb0000 rw-p 00000000 00:00 0 [heap] Size: 11268 kB Rss: 5760 kB Pss: 3809 kB Shared_Clean: 0 kB Shared_Dirty: 2276 kB Private_Clean: 0 kB Private_Dirty: 3484 kB Referenced: 3680 kB Anonymous: 5760 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB ProtectionKey: 0 VmFlags: rd wr mr mp me ac sd 01bb0000-162de000 rw-p 00000000 00:00 0 [heap] Size: 335032 kB Rss: 33628 kB Pss: 33628 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 33628 kB Referenced: 33628 kB Anonymous: 33628 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB ProtectionKey: 0 VmFlags: rd wr mr mp me ac sd
Node 2:
011c3000-01cc4000 rw-p 00000000 00:00 0 [heap] Size: 11268 kB Rss: 8576 kB Pss: 4753 kB Shared_Clean: 0 kB Shared_Dirty: 5336 kB Private_Clean: 0 kB Private_Dirty: 3240 kB Referenced: 3444 kB Anonymous: 8576 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB ProtectionKey: 0 VmFlags: rd wr mr mp me ac sd 01cc4000-027c4000 rw-p 00000000 00:00 0 [heap] Size: 11264 kB Rss: 9152 kB Pss: 9152 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 9152 kB Referenced: 9152 kB Anonymous: 9152 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB ProtectionKey: 0 VmFlags: rd wr mr mp me ac sd
Can you please collect output of the following commands for PostgreSQL backends.
- Select yb_mem_usage_kb()
- Select yb_mem_usage_sql_kb()
If possible then attach gdb to a backend and dump the memory context in the using the following command in gdb. memory context info is printed in PostgreSQL logs.
(gdb) call MemoryContextStats(TopMemoryContext)
@sushantrmishra Adding the same details I mentioned on slack threads. Also attaching the memory context and yb_mem_usage_kb output
As a part of stability effort we saw that all the Postgres backend’s memory usage on couple of nodes on cluster increases to 300-600M(based on schema). The issue occurs when large number of concurrent Postgres processes(all having high memory footprint) and it brings down a node with OOM exception on 2.15.0.0 build 1 On further debugging we found that in build 2.13.2.0 build 20 cause of change to prefetch sys catalog tables while cache initialisation/refreshing(https://phabricator.dev.yugabyte.com/D13923) we see peaks in memory usage of these backend processes. These memory usage peaks do recurse at regular interval. These memory peaks are only observed for all the new connections established post DDL operations(verified for table create/drop). Analysis details: Below graphs are of Postgres backend’s memory usage for sysbench oltp_read workload with 2connection per node. The cluster had 3 nodes with 300 tables and 25k rows per table. I ran this workload on 2.13.2.0 build 18(without change) and 2.13.2.0 build 21 (with change). Each graph has 6 legend(1 for each Postgres backend proces) I.e N1_P1 is Postgres backend process on Node 1. As can be seen that all four Postgres backend(node 2 and 3) peak at same time to 660M which is 1.3G on C5.Large(4G memory). For the same workload if I had 6-10 connections established on same node then it will result in OOM and bring down the node as seen in the issue https://github.com/yugabyte/yugabyte-db/issues/13014 I have also capture TopMemoryContext for backend where I saw the usage was in 300+M and also compared it with backend with 30-40M. The major difference seems to be in allocation done under CacheMemoryContext. Attaching image which captures the complete memory allocation done by the postgres backend.


