relate icon indicating copy to clipboard operation
relate copied to clipboard

OOM kills

Open inducer opened this issue 1 month ago • 2 comments

cc @lukeolson

Full OOM kill log

[Fri Nov 21 14:22:56 2025] postgres invoked oom-killer: gfp_mask=0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO|__GFP_COMP), order=0, oom_score_adj=0
[Fri Nov 21 14:22:56 2025] CPU: 2 UID: 104 PID: 3589817 Comm: postgres Not tainted 6.16.9+deb14-amd64 #1 PREEMPT(lazy)  Debian 6.16.9-1 
[Fri Nov 21 14:22:56 2025] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[Fri Nov 21 14:22:56 2025] Call Trace:
[Fri Nov 21 14:22:56 2025]  <TASK>
[Fri Nov 21 14:22:56 2025]  dump_stack_lvl+0x5d/0x80
[Fri Nov 21 14:22:56 2025]  dump_header+0x43/0x1aa
[Fri Nov 21 14:22:56 2025]  oom_kill_process.cold+0xa/0xab
[Fri Nov 21 14:22:56 2025]  out_of_memory+0x217/0x4b0
[Fri Nov 21 14:22:56 2025]  __alloc_pages_slowpath.constprop.0+0xc2d/0xdd0
[Fri Nov 21 14:22:56 2025]  ? __lruvec_stat_mod_folio+0x83/0xd0
[Fri Nov 21 14:22:56 2025]  __alloc_frozen_pages_noprof+0x307/0x320
[Fri Nov 21 14:22:56 2025]  alloc_pages_mpol+0x7d/0x180
[Fri Nov 21 14:22:56 2025]  vma_alloc_folio_noprof+0x6e/0xd0
[Fri Nov 21 14:22:56 2025]  do_anonymous_page+0x33c/0x9b0
[Fri Nov 21 14:22:56 2025]  ? ___pte_offset_map+0x1b/0x180
[Fri Nov 21 14:22:56 2025]  __handle_mm_fault+0xb2c/0xf70
[Fri Nov 21 14:22:56 2025]  handle_mm_fault+0xec/0x2e0
[Fri Nov 21 14:22:56 2025]  do_user_addr_fault+0x2c3/0x7f0
[Fri Nov 21 14:22:56 2025]  exc_page_fault+0x74/0x180
[Fri Nov 21 14:22:56 2025]  asm_exc_page_fault+0x26/0x30
[Fri Nov 21 14:22:56 2025] RIP: 0033:0x7f3a8377e4df
[Fri Nov 21 14:22:56 2025] Code: fe 48 8d 54 11 80 0f 1f 44 00 00 62 e1 fe 28 6f 0e 62 e1 fe 28 6f 56 01 62 e1 fe 28 6f 5e 02 62 e1 fe 28 6f 66 03 48 83 ee 80 <62> e1 fd 28 7f 0f 62 e1 fd 28 7f 57 01 62 e1 fd 28 7f 5f 02 62 e1
[Fri Nov 21 14:22:56 2025] RSP: 002b:00007ffe58a5db98 EFLAGS: 00010203
[Fri Nov 21 14:22:56 2025] RAX: 00007f3a83099ef0 RBX: 00007f3a83099e88 RCX: 00007f3a83099ef0
[Fri Nov 21 14:22:56 2025] RDX: 00007f3a8309a754 RSI: 0000561fa0d85958 RDI: 00007f3a8309a000
[Fri Nov 21 14:22:56 2025] RBP: 00007ffe58a5dd10 R08: 0000000000000000 R09: 00007f3a8307f010
[Fri Nov 21 14:22:56 2025] R10: 0000000000000000 R11: 0000000000065190 R12: 0000561fa0d857b0
[Fri Nov 21 14:22:56 2025] R13: 0000561fa0d070b0 R14: 0000000000000003 R15: 0000561fa0d89438
[Fri Nov 21 14:22:56 2025]  </TASK>
[Fri Nov 21 14:22:56 2025] Mem-Info:
[Fri Nov 21 14:22:56 2025] active_anon:991834 inactive_anon:12467 isolated_anon:0
                            active_file:55183 inactive_file:637883 isolated_file:0
                            unevictable:0 dirty:84 writeback:0
                            slab_reclaimable:90665 slab_unreclaimable:128317
                            mapped:31586 shmem:18894 pagetables:5957
                            sec_pagetables:0 bounce:0
                            kernel_misc_reclaimable:0
                            free:40597 free_pcp:21062 free_cma:0
[Fri Nov 21 14:22:56 2025] Node 0 active_anon:3967336kB inactive_anon:49868kB active_file:220732kB inactive_file:2551532kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:126344kB dirty:336kB writeback:0kB shmem:75576kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:63488kB writeback_tmp:0kB kernel_stack:6156kB pagetables:23828kB sec_pagetables:0kB all_unreclaimable? yes Balloon:0kB
[Fri Nov 21 14:22:56 2025] Node 0 DMA free:14304kB boost:0kB min:128kB low:160kB high:192kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Fri Nov 21 14:22:56 2025] lowmem_reserve[]: 0 2951 7929 7929 7929
[Fri Nov 21 14:22:56 2025] Node 0 DMA32 free:97836kB boost:55912kB min:80764kB low:86976kB high:93188kB reserved_highatomic:0KB free_highatomic:0KB active_anon:1550932kB inactive_anon:12716kB active_file:80340kB inactive_file:1009140kB unevictable:0kB writepending:84kB present:3129216kB managed:3022604kB mlocked:0kB bounce:0kB free_pcp:59712kB local_pcp:23780kB free_cma:0kB
[Fri Nov 21 14:22:56 2025] lowmem_reserve[]: 0 0 4978 4978 4978
[Fri Nov 21 14:22:56 2025] Node 0 Normal free:50248kB boost:8192kB min:50792kB low:61440kB high:72088kB reserved_highatomic:0KB free_highatomic:0KB active_anon:2416268kB inactive_anon:37152kB active_file:140380kB inactive_file:1542404kB unevictable:0kB writepending:252kB present:5242880kB managed:5097488kB mlocked:0kB bounce:0kB free_pcp:24536kB local_pcp:5684kB free_cma:0kB
[Fri Nov 21 14:22:56 2025] lowmem_reserve[]: 0 0 0 0 0
[Fri Nov 21 14:22:56 2025] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 1*1024kB (U) 2*2048kB (UM) 2*4096kB (M) = 14304kB
[Fri Nov 21 14:22:56 2025] Node 0 DMA32: 6*4kB (ME) 8*8kB (ME) 8*16kB (M) 318*32kB (ME) 657*64kB (UME) 353*128kB (UME) 2*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 98136kB
[Fri Nov 21 14:22:56 2025] Node 0 Normal: 0*4kB 1*8kB (E) 0*16kB 253*32kB (UE) 287*64kB (UME) 169*128kB (UME) 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 0*4096kB = 50152kB
[Fri Nov 21 14:22:56 2025] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Fri Nov 21 14:22:56 2025] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Fri Nov 21 14:22:56 2025] 765539 total pagecache pages
[Fri Nov 21 14:22:56 2025] 53617 pages in swap cache
[Fri Nov 21 14:22:56 2025] Free swap  = 3883132kB
[Fri Nov 21 14:22:56 2025] Total swap = 4636668kB
[Fri Nov 21 14:22:56 2025] 2097022 pages RAM
[Fri Nov 21 14:22:56 2025] 0 pages HighMem/MovableOnly
[Fri Nov 21 14:22:56 2025] 63159 pages reserved
[Fri Nov 21 14:22:56 2025] 0 pages hwpoisoned
[Fri Nov 21 14:22:56 2025] Tasks state (memory values in pages):
[Fri Nov 21 14:22:56 2025] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[Fri Nov 21 14:22:56 2025] [    375]     0   375    15614     1883      160     1723         0   106496      256          -250 systemd-journal
[Fri Nov 21 14:22:56 2025] [    409]     0   409     9936      542       96      446         0    98304      832         -1000 systemd-udevd
[Fri Nov 21 14:22:56 2025] [    717]     0   717     5633      463        0      463         0    81920      384             0 VGAuthService
[Fri Nov 21 14:22:56 2025] [    718]     0   718    63376      906      160      746         0   110592      350             0 vmtoolsd
[Fri Nov 21 14:22:56 2025] [    730]     0   730      653      207        0      207         0    40960       32             0 acpid
[Fri Nov 21 14:22:56 2025] [    733]   109   733     2639      599      160      439         0    61440      192          -900 dbus-daemon
[Fri Nov 21 14:22:56 2025] [    737]   112   737   543627     4124     2369     1755         0   376832     1176             0 docker-swarm
[Fri Nov 21 14:22:56 2025] [    742]     0   742     1498      542       96      446         0    49152      128             0 dhclient
[Fri Nov 21 14:22:56 2025] [    756]   115   756   422715     4323     2488     1835         0   212992      183             0 prometheus-node
[Fri Nov 21 14:22:56 2025] [    758]     0   758    55425      551       96      455         0    81920      432             0 rsyslogd
[Fri Nov 21 14:22:56 2025] [    761]     0   761     5361     1166      128     1038         0    77824      288             0 systemd-logind
[Fri Nov 21 14:22:56 2025] [    767]     0   767     1454      296        0      296         0    53248       64             0 atd
[Fri Nov 21 14:22:56 2025] [    932]   105   932    23031      601       64      537         0    90112      192             0 systemd-timesyn
[Fri Nov 21 14:22:56 2025] [   1169]     0  1169   118942     4710     2675     2035         0   245760      512             0 fail2ban-server
[Fri Nov 21 14:22:56 2025] [   1171] 65534  1171   137590    12934    12544      390         0   274432     4736             0 memcached
[Fri Nov 21 14:22:56 2025] [   1177]     0  1177      672      180        0      180         0    40960        0             0 nebula-systemd-
[Fri Nov 21 14:22:56 2025] [   1184]     0  1184   531757     2938     1280     1658         0   237568      304             0 nebula
[Fri Nov 21 14:22:56 2025] [   1196]     0  1196    30069      579       63      516         0   135168     3392             0 unattended-upgr
[Fri Nov 21 14:22:56 2025] [   1199]     0  1199   487124     4703     3193     1510         0   303104      160          -999 containerd
[Fri Nov 21 14:22:56 2025] [   1249]     0  1249     2431      426       64      362         0    69632      192             0 crond
[Fri Nov 21 14:22:56 2025] [   1256]     0  1256     1298      259       32      227         0    57344       32             0 agetty
[Fri Nov 21 14:22:56 2025] [   1278]     0  1278     3440      823       31      792         0    77824      320         -1000 sshd
[Fri Nov 21 14:22:56 2025] [   1497]   104  1497    56229      871       64      711        96   180224      544          -900 postgres
[Fri Nov 21 14:22:56 2025] [   1538]     0  1538    14805      267       78      189         0   126976      864             0 nginx
[Fri Nov 21 14:22:56 2025] [   1541]    33  1541    16244     2432     1145     1287         0   163840     1256             0 nginx
[Fri Nov 21 14:22:56 2025] [   1542]    33  1542    16052     2264     1002     1262         0   159744     1166             0 nginx
[Fri Nov 21 14:22:56 2025] [   1543]    33  1543    16421     2295     1036     1259         0   163840     1536             0 nginx
[Fri Nov 21 14:22:56 2025] [   1544]    33  1544    16408     2328     1065     1263         0   163840     1536             0 nginx
[Fri Nov 21 14:22:56 2025] [   1611]   104  1611    56264     1212       70      467       675   401408      608             0 postgres
[Fri Nov 21 14:22:56 2025] [   1613]   104  1613    56264      953       70      473       410   401408      608             0 postgres
[Fri Nov 21 14:22:56 2025] [   1614]   104  1614    56264      919       38      431       450   401408      608             0 postgres
[Fri Nov 21 14:22:56 2025] [   1615]   104  1615    56336     8205       71      586      7548   401408      672             0 postgres
[Fri Nov 21 14:22:56 2025] [   1616]   104  1616    56264     1209        6      437       766   401408      640             0 postgres
[Fri Nov 21 14:22:56 2025] [   1657]   104  1657    56264     1472       38      391      1043   131072      576             0 postgres
[Fri Nov 21 14:22:56 2025] [   1658]   104  1658    56632      669       70      498       101   196608      640             0 postgres
[Fri Nov 21 14:22:56 2025] [   1659]   104  1659    56632      515       38      409        68   131072      672             0 postgres
[Fri Nov 21 14:22:56 2025] [   1887]   101  1887    10492     4323     3264     1059         0   122880      512             0 exim4
[Fri Nov 21 14:22:56 2025] [   5244]   113  5244   668174    11140     8587     1200      1353   475136     6640          -500 beam.smp
[Fri Nov 21 14:22:56 2025] [   5254]   113  5254      643      140        0      140         0    45056        0          -500 erl_child_setup
[Fri Nov 21 14:22:56 2025] [   5277]   113  5277      672      251       32      219         0    45056        0          -500 sh
[Fri Nov 21 14:22:56 2025] [   5279]   113  5279      609      174        0      174         0    45056        0          -500 memsup
[Fri Nov 21 14:22:56 2025] [   5280]   113  5280      642      126        0      126         0    45056        0          -500 cpu_sup
[Fri Nov 21 14:22:56 2025] [   5282]   113  5282      892      201        0      201         0    45056        0          -500 inet_gethost
[Fri Nov 21 14:22:56 2025] [   5283]   113  5283      892      320       32      288         0    49152        0          -500 inet_gethost
[Fri Nov 21 14:22:56 2025] [   5284]   108  5284     1186      322       32      290         0    49152        0             0 epmd
[Fri Nov 21 14:22:56 2025] [   5287]   113  5287      672      284       64      220         0    49152        0          -500 sh
[Fri Nov 21 14:22:56 2025] [1099268]    33 1099268    31403     1486      326     1128        32   294912    18208             0 uwsgi
[Fri Nov 21 14:22:56 2025] [1099306]    33 1099306    39308     3677     1614     2063         0   348160    25920             0 python
[Fri Nov 21 14:22:56 2025] [1099316]    33 1099316   104205    11897     9137     2760         0   532480    33632             0 python
[Fri Nov 21 14:22:56 2025] [1099317]    33 1099317    46090      831      209      622         0   368640    34144             0 python
[Fri Nov 21 14:22:56 2025] [1099318]    33 1099318    45624      797      175      622         0   364544    33696             0 python
[Fri Nov 21 14:22:56 2025] [1099319]    33 1099319    46093      864      274      590         0   368640    34080             0 python
[Fri Nov 21 14:22:56 2025] [3570171]    33 3570171   127916    71004    63771     7233         0   745472      631             0 uwsgi
[Fri Nov 21 14:22:56 2025] [3571138]    33 3571138   124964    68314    61052     7262         0   724992      695             0 uwsgi
[Fri Nov 21 14:22:56 2025] [3574214]    33 3574214   117792    61249    54475     6774         0   663552      663             0 uwsgi
[Fri Nov 21 14:22:56 2025] [3575389]    33 3575389   120561    63704    56557     7147         0   708608      631             0 uwsgi
[Fri Nov 21 14:22:56 2025] [3576655]    33 3576655   123214    66579    59275     7304         0   716800      663             0 uwsgi
[Fri Nov 21 14:22:56 2025] [3577742]    33 3577742   120842    64038    56764     7274         0   696320      663             0 uwsgi
[Fri Nov 21 14:22:56 2025] [3577818]    33 3577818   124344    67636    60391     7245         0   724992      695             0 uwsgi
[Fri Nov 21 14:22:56 2025] [3577969]    33 3577969   121016    64505    57203     7302         0   708608      599             0 uwsgi
[Fri Nov 21 14:22:56 2025] [3578050]    33 3578050   127735    71491    64165     7326         0   745472      663             0 uwsgi
[Fri Nov 21 14:22:56 2025] [3579254]    33 3579254   124510    67844    60578     7266         0   720896      695             0 uwsgi
[Fri Nov 21 14:22:56 2025] [3580132]    33 3580132   122069    65873    58339     7534         0   712704      663             0 uwsgi
[Fri Nov 21 14:22:56 2025] [3580253]    33 3580253   122426    65851    58593     7258         0   700416      663             0 uwsgi
[Fri Nov 21 14:22:56 2025] [3581062]    33 3581062   122414    65749    58500     7249         0   712704      631             0 uwsgi
[Fri Nov 21 14:22:56 2025] [3581242]    33 3581242   113647    56759    49523     7236         0   638976      663             0 uwsgi
[Fri Nov 21 14:22:56 2025] [3581527]    33 3581527   129473    73055    65773     7282         0   757760      663             0 uwsgi
[Fri Nov 21 14:22:56 2025] [3582537]    33 3582537   163692    70561    62909     7652         0   757760      663             0 uwsgi
[Fri Nov 21 14:22:56 2025] [3589815]   104 3589815    56968     4590      516     2729      1345   200704      410             0 postgres
[Fri Nov 21 14:22:56 2025] [3589816]   104 3589816    56264     1482      164     1318         0   106496      474             0 postgres
[Fri Nov 21 14:22:56 2025] [3589817]   104 3589817    56968     5065      548     2729      1788   221184      410             0 postgres
[Fri Nov 21 14:22:56 2025] [3589818]   104 3589818    56670     2470      260     1870       340   155648      442             0 postgres
[Fri Nov 21 14:22:56 2025] [3589819]   104 3589819    56805     3552      420     2337       795   172032      442             0 postgres
[Fri Nov 21 14:22:56 2025] [3589820]   104 3589820    56264     1482      164     1318         0   106496      474             0 postgres
[Fri Nov 21 14:22:56 2025] [3589821]   104 3589821    56264     1482      164     1318         0   106496      474             0 postgres
[Fri Nov 21 14:22:56 2025] [3589822]   104 3589822    56264     1130      100     1030         0   106496      538             0 postgres
[Fri Nov 21 14:22:56 2025] [3589823]   104 3589823    56264     1477      164     1313         0   106496      474             0 postgres
[Fri Nov 21 14:22:56 2025] [3589824]   104 3589824    56264     1482      164     1318         0   106496      474             0 postgres
[Fri Nov 21 14:22:56 2025] [3589825]   104 3589825    56264     1482      164     1318         0   106496      474             0 postgres
[Fri Nov 21 14:22:56 2025] [3589826]   104 3589826    56264     1478      164     1314         0   106496      474             0 postgres
[Fri Nov 21 14:22:56 2025] [3589827]   104 3589827    56264     1130      100     1030         0   106496      538             0 postgres
[Fri Nov 21 14:22:56 2025] [3589828]   104 3589828    56264     1130      100     1030         0   106496      538             0 postgres
[Fri Nov 21 14:22:56 2025] [3589829]   104 3589829    56264     1098       68     1030         0   106496      538             0 postgres
[Fri Nov 21 14:22:56 2025] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=system-postgresql.slice,mems_allowed=0,global_oom,task_memcg=/system.slice/uwsgi.service,task=uwsgi,pid=3581527,uid=33
[Fri Nov 21 14:22:56 2025] Out of memory: Killed process 3581527 (uwsgi) total-vm:517892kB, anon-rss:263092kB, file-rss:29128kB, shmem-rss:0kB, UID:33 pgtables:740kB oom_score_adj:0
[Fri Nov 21 14:22:58 2025] systemd-journald[375]: Under memory pressure, flushing caches.

Full OOM kill log this time:

[Mon Dec  8 17:51:31 2025] postgres invoked oom-killer: gfp_mask=0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO|__GFP_COMP), order=0, oom_score_adj=0
[Mon Dec  8 17:51:31 2025] CPU: 1 UID: 104 PID: 1073875 Comm: postgres Not tainted 6.16.9+deb14-amd64 #1 PREEMPT(lazy)  Debian 6.16.9-1 
[Mon Dec  8 17:51:31 2025] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[Mon Dec  8 17:51:31 2025] Call Trace:
[Mon Dec  8 17:51:31 2025]  <TASK>
[Mon Dec  8 17:51:31 2025]  dump_stack_lvl+0x5d/0x80
[Mon Dec  8 17:51:31 2025]  dump_header+0x43/0x1aa
[Mon Dec  8 17:51:31 2025]  oom_kill_process.cold+0xa/0xab
[Mon Dec  8 17:51:31 2025]  out_of_memory+0x217/0x4b0
[Mon Dec  8 17:51:31 2025]  __alloc_pages_slowpath.constprop.0+0xc2d/0xdd0
[Mon Dec  8 17:51:31 2025]  __alloc_frozen_pages_noprof+0x307/0x320
[Mon Dec  8 17:51:31 2025]  alloc_pages_mpol+0x7d/0x180
[Mon Dec  8 17:51:31 2025]  vma_alloc_folio_noprof+0x6e/0xd0
[Mon Dec  8 17:51:31 2025]  do_anonymous_page+0x33c/0x9b0
[Mon Dec  8 17:51:31 2025]  ? ___pte_offset_map+0x1b/0x180
[Mon Dec  8 17:51:31 2025]  __handle_mm_fault+0xb2c/0xf70
[Mon Dec  8 17:51:31 2025]  ? do_syscall_64+0xbc/0x320
[Mon Dec  8 17:51:31 2025]  handle_mm_fault+0xec/0x2e0
[Mon Dec  8 17:51:31 2025]  do_user_addr_fault+0x2c3/0x7f0
[Mon Dec  8 17:51:31 2025]  exc_page_fault+0x74/0x180
[Mon Dec  8 17:51:31 2025]  asm_exc_page_fault+0x26/0x30
[Mon Dec  8 17:51:31 2025] RIP: 0033:0x561f886844ed
[Mon Dec  8 17:51:31 2025] Code: 48 89 da 31 f6 e8 03 d3 a3 ff 49 89 c0 48 8b 5d f8 4c 89 c0 c9 c3 0f 1f 80 00 00 00 00 89 c2 49 8d 78 08 49 c7 00 00 00 00 00 <49> c7 44 10 f8 00 00 00 00 48 83 e7 f8 4c 89 c2 48 29 fa 8d 0c 10
[Mon Dec  8 17:51:31 2025] RSP: 002b:00007ffe58a5dc80 EFLAGS: 00010202
[Mon Dec  8 17:51:31 2025] RAX: 00000000000001e8 RBX: 000000000000003c RCX: 0000000000000006
[Mon Dec  8 17:51:31 2025] RDX: 00000000000001e8 RSI: 00000000000001e8 RDI: 00007f3a830bef28
[Mon Dec  8 17:51:31 2025] RBP: 00007ffe58a5dc90 R08: 00007f3a830bef20 R09: 00007f3a8307f010
[Mon Dec  8 17:51:31 2025] R10: 0000000000000000 R11: 00000000000400f8 R12: 0000000000004168
[Mon Dec  8 17:51:31 2025] R13: 0000561fa0d070b0 R14: 0000561fa0d08270 R15: 0000561fa0d08f48
[Mon Dec  8 17:51:31 2025]  </TASK>
[Mon Dec  8 17:51:31 2025] Mem-Info:
[Mon Dec  8 17:51:31 2025] active_anon:959961 inactive_anon:7240 isolated_anon:0
                            active_file:35484 inactive_file:639846 isolated_file:0
                            unevictable:0 dirty:80 writeback:0
                            slab_reclaimable:82685 slab_unreclaimable:178286
                            mapped:33170 shmem:21928 pagetables:6089
                            sec_pagetables:0 bounce:0
                            kernel_misc_reclaimable:0
                            free:45739 free_pcp:12884 free_cma:0
[Mon Dec  8 17:51:31 2025] Node 0 active_anon:3839844kB inactive_anon:28960kB active_file:141936kB inactive_file:2559384kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:132680kB dirty:320kB writeback:0kB shmem:87712kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:65536kB writeback_tmp:0kB kernel_stack:6088kB pagetables:24356kB sec_pagetables:0kB all_unreclaimable? yes Balloon:0kB
[Mon Dec  8 17:51:31 2025] Node 0 DMA free:14304kB boost:0kB min:128kB low:160kB high:192kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Mon Dec  8 17:51:31 2025] lowmem_reserve[]: 0 2951 7929 7929 7929
[Mon Dec  8 17:51:31 2025] Node 0 DMA32 free:97984kB boost:55912kB min:80764kB low:86976kB high:93188kB reserved_highatomic:0KB free_highatomic:0KB active_anon:1119160kB inactive_anon:7936kB active_file:42652kB inactive_file:1474756kB unevictable:0kB writepending:92kB present:3129216kB managed:3022604kB mlocked:0kB bounce:0kB free_pcp:24972kB local_pcp:13552kB free_cma:0kB
[Mon Dec  8 17:51:31 2025] lowmem_reserve[]: 0 0 4978 4978 4978
[Mon Dec  8 17:51:31 2025] Node 0 Normal free:70668kB boost:28672kB min:71272kB low:81920kB high:92568kB reserved_highatomic:0KB free_highatomic:0KB active_anon:2720168kB inactive_anon:21024kB active_file:99284kB inactive_file:1084628kB unevictable:0kB writepending:228kB present:5242880kB managed:5097488kB mlocked:0kB bounce:0kB free_pcp:26564kB local_pcp:6824kB free_cma:0kB
[Mon Dec  8 17:51:31 2025] lowmem_reserve[]: 0 0 0 0 0
[Mon Dec  8 17:51:31 2025] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 1*1024kB (U) 2*2048kB (UM) 2*4096kB (M) = 14304kB
[Mon Dec  8 17:51:31 2025] Node 0 DMA32: 0*4kB 78*8kB (UE) 1015*16kB (E) 783*32kB (UE) 463*64kB (UE) 190*128kB (UE) 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 0*4096kB = 97920kB
[Mon Dec  8 17:51:31 2025] Node 0 Normal: 2*4kB (M) 2*8kB (M) 3*16kB (ME) 438*32kB (UME) 495*64kB (UME) 182*128kB (UME) 2*256kB (M) 2*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 70600kB
[Mon Dec  8 17:51:31 2025] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Mon Dec  8 17:51:31 2025] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Mon Dec  8 17:51:31 2025] 759595 total pagecache pages
[Mon Dec  8 17:51:31 2025] 62373 pages in swap cache
[Mon Dec  8 17:51:31 2025] Free swap  = 3906708kB
[Mon Dec  8 17:51:31 2025] Total swap = 4636668kB
[Mon Dec  8 17:51:31 2025] 2097022 pages RAM
[Mon Dec  8 17:51:31 2025] 0 pages HighMem/MovableOnly
[Mon Dec  8 17:51:31 2025] 63159 pages reserved
[Mon Dec  8 17:51:31 2025] 0 pages hwpoisoned
[Mon Dec  8 17:51:31 2025] Tasks state (memory values in pages):
[Mon Dec  8 17:51:31 2025] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[Mon Dec  8 17:51:31 2025] [    375]     0   375    14010     2008      160     1848         0   102400      256          -250 systemd-journal
[Mon Dec  8 17:51:31 2025] [    409]     0   409     9936      626       64      562         0    98304      864         -1000 systemd-udevd
[Mon Dec  8 17:51:31 2025] [    717]     0   717     5633      431        0      431         0    81920      384             0 VGAuthService
[Mon Dec  8 17:51:31 2025] [    718]     0   718    63376      897      160      737         0   110592      318             0 vmtoolsd
[Mon Dec  8 17:51:31 2025] [    730]     0   730      653      207        0      207         0    40960       32             0 acpid
[Mon Dec  8 17:51:31 2025] [    733]   109   733     2639      663      224      439         0    61440      160          -900 dbus-daemon
[Mon Dec  8 17:51:31 2025] [    737]   112   737   543627     4691     2751     1940         0   376832     1240             0 docker-swarm
[Mon Dec  8 17:51:31 2025] [    742]     0   742     1498      662      128      534         0    49152       96             0 dhclient
[Mon Dec  8 17:51:31 2025] [    756]   115   756   422715     4532     2625     1907         0   212992       68             0 prometheus-node
[Mon Dec  8 17:51:31 2025] [    758]     0   758    55425      545       96      449         0    81920      400             0 rsyslogd
[Mon Dec  8 17:51:31 2025] [    761]     0   761     5390      883      128      755         0    77824      288             0 systemd-logind
[Mon Dec  8 17:51:31 2025] [    767]     0   767     1454      296        0      296         0    53248       64             0 atd
[Mon Dec  8 17:51:31 2025] [    932]   105   932    23031      569       64      505         0    90112      192             0 systemd-timesyn
[Mon Dec  8 17:51:31 2025] [   1169]     0  1169   117342     4659     2675     1984         0   241664      576             0 fail2ban-server
[Mon Dec  8 17:51:31 2025] [   1171] 65534  1171   137590    15300    14912      388         0   274432     2368             0 memcached
[Mon Dec  8 17:51:31 2025] [   1177]     0  1177      672      148        0      148         0    40960        0             0 nebula-systemd-
[Mon Dec  8 17:51:31 2025] [   1184]     0  1184   531757     2592     1209     1383         0   237568      376             0 nebula
[Mon Dec  8 17:51:31 2025] [   1196]     0  1196    30069      515       31      484         0   135168     3424             0 unattended-upgr
[Mon Dec  8 17:51:31 2025] [   1199]     0  1199   487124     4906     3193     1713         0   307200      128          -999 containerd
[Mon Dec  8 17:51:31 2025] [   1249]     0  1249     2431      426       64      362         0    69632      192             0 crond
[Mon Dec  8 17:51:31 2025] [   1256]     0  1256     1298      259       32      227         0    57344       32             0 agetty
[Mon Dec  8 17:51:31 2025] [   1278]     0  1278     3440      841       31      810         0    77824      320         -1000 sshd
[Mon Dec  8 17:51:31 2025] [   1497]   104  1497    56229      871       64      711        96   180224      544          -900 postgres
[Mon Dec  8 17:51:31 2025] [   1611]   104  1611    56264     1670       38      406      1226   401408      640             0 postgres
[Mon Dec  8 17:51:31 2025] [   1613]   104  1613    56264     1238       70      409       759   401408      640             0 postgres
[Mon Dec  8 17:51:31 2025] [   1614]   104  1614    56264     1248       38      407       803   401408      608             0 postgres
[Mon Dec  8 17:51:31 2025] [   1615]   104  1615    56336     8473      103      601      7769   401408      608             0 postgres
[Mon Dec  8 17:51:31 2025] [   1616]   104  1616    56264     1238       38      409       791   401408      608             0 postgres
[Mon Dec  8 17:51:31 2025] [   1657]   104  1657    56264     1431       38      359      1034   131072      576             0 postgres
[Mon Dec  8 17:51:31 2025] [   1658]   104  1658    56632      573       38      434       101   196608      672             0 postgres
[Mon Dec  8 17:51:31 2025] [   1659]   104  1659    56632      523       38      415        70   131072      672             0 postgres
[Mon Dec  8 17:51:31 2025] [   1887]   101  1887    10492     4272     3264     1008         0   122880      512             0 exim4
[Mon Dec  8 17:51:31 2025] [   5244]   113  5244   666890    12030     9606     1135      1289   471040     6614          -500 beam.smp
[Mon Dec  8 17:51:31 2025] [   5254]   113  5254      643      140        0      140         0    45056        0          -500 erl_child_setup
[Mon Dec  8 17:51:31 2025] [   5277]   113  5277      672      219        0      219         0    45056        0          -500 sh
[Mon Dec  8 17:51:31 2025] [   5279]   113  5279      609      174        0      174         0    45056        0          -500 memsup
[Mon Dec  8 17:51:31 2025] [   5280]   113  5280      642      126        0      126         0    45056        0          -500 cpu_sup
[Mon Dec  8 17:51:31 2025] [   5282]   113  5282      892      201        0      201         0    45056        0          -500 inet_gethost
[Mon Dec  8 17:51:31 2025] [   5283]   113  5283      892      320       32      288         0    49152        0          -500 inet_gethost
[Mon Dec  8 17:51:31 2025] [   5284]   108  5284     1186      322       32      290         0    49152        0             0 epmd
[Mon Dec  8 17:51:31 2025] [   5287]   113  5287      672      220        0      220         0    49152        0          -500 sh
[Mon Dec  8 17:51:31 2025] [1099306]    33 1099306    39308     3485     1646     1839         0   348160    25888             0 python
[Mon Dec  8 17:51:31 2025] [1099316]    33 1099316   104205    17263    14193     3070         0   532480    28736             0 python
[Mon Dec  8 17:51:31 2025] [1099317]    33 1099317    46090      735      177      558         0   368640    34176             0 python
[Mon Dec  8 17:51:31 2025] [1099318]    33 1099318    45624      733      175      558         0   364544    33696             0 python
[Mon Dec  8 17:51:31 2025] [1099319]    33 1099319    46093      800      242      558         0   368640    34112             0 python
[Mon Dec  8 17:51:31 2025] [3590826]    33 3590826    30917     1890      783     1075        32   294912    17088             0 uwsgi
[Mon Dec  8 17:51:31 2025] [3995583]     0 3995583    14805      476      173      303         0   122880      800             0 nginx
[Mon Dec  8 17:51:31 2025] [3995584]    33 3995584    16521     2602     1046     1556         0   163840     1594             0 nginx
[Mon Dec  8 17:51:31 2025] [3995585]    33 3995585    16567     2391      842     1549         0   163840     1920             0 nginx
[Mon Dec  8 17:51:31 2025] [3995586]    33 3995586    16286     2517      991     1526         0   163840     1420             0 nginx
[Mon Dec  8 17:51:31 2025] [3995587]    33 3995587    16386     2439      918     1521         0   163840     1632             0 nginx
[Mon Dec  8 17:51:31 2025] [1059869]    33 1059869   117275    60979    53363     7616         0   663552      696             0 uwsgi
[Mon Dec  8 17:51:31 2025] [1061331]    33 1061331   123729    67441    59790     7651         0   724992      664             0 uwsgi
[Mon Dec  8 17:51:31 2025] [1061541]    33 1061541   125777    69393    61673     7720         0   737280      664             0 uwsgi
[Mon Dec  8 17:51:31 2025] [1061563]    33 1061563    61415    52792    46681     6111         0   552960      664             0 uwsgi
[Mon Dec  8 17:51:31 2025] [1061638]    33 1061638   122535    66306    58646     7660         0   720896      664             0 uwsgi
[Mon Dec  8 17:51:31 2025] [1062494]    33 1062494   117735    61673    54067     7606         0   684032      664             0 uwsgi
[Mon Dec  8 17:51:31 2025] [1063549]    33 1063549   103273    46230    39541     6689         0   540672      760             0 uwsgi
[Mon Dec  8 17:51:31 2025] [1063550]    33 1063550    65342    56979    50888     6091         0   569344      664             0 uwsgi
[Mon Dec  8 17:51:31 2025] [1063754]    33 1063754   123496    67381    59632     7749         0   720896      664             0 uwsgi
[Mon Dec  8 17:51:31 2025] [1063976]    33 1063976   123594    67331    59694     7637         0   724992      664             0 uwsgi
[Mon Dec  8 17:51:31 2025] [1064899]    33 1064899   121465    65140    57484     7656         0   708608      632             0 uwsgi
[Mon Dec  8 17:51:31 2025] [1065001]    33 1065001   118661    62498    54843     7655         0   679936      664             0 uwsgi
[Mon Dec  8 17:51:31 2025] [1065082]    33 1065082   126740    70146    62507     7639         0   741376      696             0 uwsgi
[Mon Dec  8 17:51:31 2025] [1065177]    33 1065177   119229    62898    55229     7669         0   679936      664             0 uwsgi
[Mon Dec  8 17:51:31 2025] [1066495]    33 1066495   124502    68292    60576     7716         0   720896      696             0 uwsgi
[Mon Dec  8 17:51:31 2025] [1069477]    33 1069477   124241    67910    60224     7686         0   720896      664             0 uwsgi
[Mon Dec  8 17:51:31 2025] [1073875]   104 1073875    57001     5828      612     2747      2469   266240      410             0 postgres
[Mon Dec  8 17:51:31 2025] [1073876]   104 1073876    56968     4478      516     2584      1378   192512      410             0 postgres
[Mon Dec  8 17:51:31 2025] [1073877]   104 1073877    56968     4692      516     2702      1474   204800      410             0 postgres
[Mon Dec  8 17:51:31 2025] [1073878]   104 1073878    56968     5219      548     2702      1969   229376      410             0 postgres
[Mon Dec  8 17:51:31 2025] [1073879]   104 1073879    56805     3928      452     2552       924   172032      410             0 postgres
[Mon Dec  8 17:51:31 2025] [1073880]   104 1073880    56805     3960      452     2584       924   172032      410             0 postgres
[Mon Dec  8 17:51:31 2025] [1073881]   104 1073881    56805     4367      484     2584      1299   184320      410             0 postgres
[Mon Dec  8 17:51:31 2025] [1073882]   104 1073882    56968     5233      548     2702      1983   229376      410             0 postgres
[Mon Dec  8 17:51:31 2025] [1073883]   104 1073883    56670     2528      292     1868       368   155648      474             0 postgres
[Mon Dec  8 17:51:31 2025] [1073884]   104 1073884    56968     4673      516     2702      1455   204800      410             0 postgres
[Mon Dec  8 17:51:31 2025] [1073885]   104 1073885    56670     2828      292     2022       514   155648      474             0 postgres
[Mon Dec  8 17:51:31 2025] [1073886]   104 1073886    56805     3928      452     2552       924   172032      410             0 postgres
[Mon Dec  8 17:51:31 2025] [1073887]   104 1073887    56968     4626      516     2670      1440   192512      410             0 postgres
[Mon Dec  8 17:51:31 2025] [1073888]   104 1073888    56670     2467      260     1868       339   155648      474             0 postgres
[Mon Dec  8 17:51:31 2025] [1073889]   104 1073889    56264     1375      132     1243         0   106496      474             0 postgres
[Mon Dec  8 17:51:31 2025] [1073890]   104 1073890    56264     1486      164     1322         0   106496      474             0 postgres
[Mon Dec  8 17:51:31 2025] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=system-postgresql.slice,mems_allowed=0,global_oom,task_memcg=/system.slice/uwsgi.service,task=uwsgi,pid=1065082,uid=33
[Mon Dec  8 17:51:31 2025] Out of memory: Killed process 1065082 (uwsgi) total-vm:506960kB, anon-rss:250028kB, file-rss:30556kB, shmem-rss:0kB, UID:33 pgtables:724kB oom_score_adj:0
[Mon Dec  8 17:51:33 2025] systemd-journald[375]: Under memory pressure, flushing caches.

A few days prior to today's episode:

[Sat Dec  6 01:01:26 2025] systemd-journald[375]: Under memory pressure, flushing caches.

So it appears this builds slowly. Some prime candidates according to today's log:

- memcache (2.7G)
- docker-swarm (2G)

inducer avatar Dec 09 '25 01:12 inducer

Peculiar pattern in node_memory_Mapped_bytes:

Image

(Internal link)

inducer avatar Dec 09 '25 15:12 inducer

Here's what systemd says:

Dec 08 17:52:24 relate systemd[1]: uwsgi.service: A process of this unit has been killed by the OOM killer.
Dec 08 17:52:28 relate uwsgi[1073903]: Stopping app server(s): uwsgi -> . done.
Dec 08 17:52:28 relate systemd[1]: uwsgi.service: Failed with result 'oom-kill'.
Dec 08 17:52:28 relate systemd[1]: uwsgi.service: Consumed 9h 1min 7.749s CPU time, 4.1G memory peak, 3.2G memory swap peak.
Dec 08 18:56:09 relate systemd[1]: Starting uwsgi.service - LSB: Start/stop uWSGI server instance(s)...
Dec 08 18:56:09 relate uwsgi[1078236]: Starting app server(s): uwsgi -> . done.
Dec 08 18:56:09 relate systemd[1]: Started uwsgi.service - LSB: Start/stop uWSGI server instance(s).

inducer avatar Dec 09 '25 15:12 inducer

I'm focusing in on the repeating pattern in node_memory_Mapped_bytes in the monitoring. This seems to have a roughly ~~2-day~~ 1-day period. What's more, near the end of that growth, there is sometimes a crazy spike:

Image

The timing of these spikes seem to coincide with when systemd reports memory pressure:

Dec 06 01:02:15 relate systemd-journald[375]: Under memory pressure, flushing caches.
Dec 08 17:52:26 relate systemd-journald[375]: Under memory pressure, flushing caches.
Dec 09 00:00:08 relate systemd-journald[375]: Under memory pressure, flushing caches.
Dec 09 01:00:49 relate systemd-journald[375]: Under memory pressure, flushing caches.
Dec 11 01:05:15 relate systemd-journald[375]: Under memory pressure, flushing caches.

As another ripple effect, node_memory_AnonPages_bytes dips down precipitously right after each of the events:

Image

The spike/growth is not observable in per-process RSS data. Here is the sum of all process's RSS over time:

Image

(X axis is in hours before now, Y axis is in gigabytes total)

If anything, that collapses to nothing after the spike, consistent with nearly everything getting swapped out.

inducer avatar Dec 11 '25 16:12 inducer

🤦 That pattern is the backup every night at 1am. I had no idea that was this rough on the system. I'll bump down the read concurrency, maybe that'll help.

So while this explains the spikes in the memory system every night at 1am, IMO it does absolutely nothing to explain the afternoon memory pressure on Dec 8.

I will have to leave my process monitoring going for longer to see if we can catch one of these events.

inducer avatar Dec 11 '25 17:12 inducer

What's peculiar is that all the memory statistics from prometheus look absolutely innocuous just prior to the Dec 8 event. This is node_memory_MemAvailable_bytes:

Image

If 3G are available, why is there an OOM kill being run?

inducer avatar Dec 11 '25 17:12 inducer

Systemd's "under memory pressure" messages seem to arise from the kernel's "pressure stall information". Chasing down that rabbit hole, prometheus has metrics:

  • node_pressure_memory_stalled_seconds_total
  • node_pressure_memory_waiting_seconds_total
  • node_pressure_cpu_waiting_seconds_total
  • node_pressure_io_stalled_seconds_total
  • node_pressure_io_waiting_seconds_total

The thing is, none of them show anything out of the ordinary for the Dec 8 event.

inducer avatar Dec 11 '25 17:12 inducer

From the Nov 21 OOM killer log:

Node 0 active_anon:3,967,336kB inactive_anon:49,868kB active_file:220,732kB
inactive_file:2551,532kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:126,344kB dirty:336kB writeback:0kB shmem:75,576kB shmem_thp:0kB
shmem_pmdmapped:0kB anon_thp:63,488kB writeback_tmp:0kB kernel_stack:6156kB
pagetables:23,828kB sec_pagetables:0kB all_unreclaimable? yes Balloon:0kB

Node 0 Normal free:50,248kB boost:8192kB min:50792kB low:61,440kB high:72,088kB
reserved_highatomic:0KB free_highatomic:0KB active_anon:2416268kB inactive_anon:37152kB
active_file:140380kB inactive_file:1542404kB unevictable:0kB writepending:252kB
present:5242880kB managed:5097488kB mlocked:0kB bounce:0kB free_pcp:24536kB
local_pcp:5684kB free_cma:0kB

The used memory listed there is close to 4GiB, but the system has 8GiB of RAM. Yet somehow only ~50MB are left free?

inducer avatar Dec 11 '25 18:12 inducer

node_vmstat_pgmajfault also does not show an increase during the Dec 8 event.

inducer avatar Dec 11 '25 18:12 inducer

After reading more than I ever wanted about Linux memory management, I'm concluding that the machine may quite simply be out of memory. (Shocker!) I've got the machine configured for 16 app workers, which take about 256MB each, so that's about 4GB. In the OOM killer log, only 5GB are reported as "managed", i.e. hypothetically available. I haven't been able to make sense of the discrepancy between that and the 8GiB physically installed. But given that

  • the Relate app server processes (4GB)
  • memcached
  • postgres
  • docker swarm
  • celery (the job queue)
  • RabbitMQ

are all supposed to fit into 5GB, it's believable that that gets tight from time to time.

inducer avatar Dec 11 '25 19:12 inducer

5GB is incorrect. What matters is the sum of the "normal" and the "DMA32" zone, which reflects the expected 8GiB.

inducer avatar Dec 11 '25 20:12 inducer