Andreas Klöckner
Andreas Klöckner
Here's what systemd says: ``` Dec 08 17:52:24 relate systemd[1]: uwsgi.service: A process of this unit has been killed by the OOM killer. Dec 08 17:52:28 relate uwsgi[1073903]: Stopping app...
I'm focusing in on the repeating pattern in `node_memory_Mapped_bytes` in the monitoring. This seems to have a roughly ~~2-day~~ 1-day period. What's more, near the end of that growth, there...
🤦 That pattern is the backup every night at 1am. I had no idea that was this rough on the system. I'll bump down the read concurrency, maybe that'll help....
What's peculiar is that all the memory statistics from prometheus look absolutely innocuous just prior to the Dec 8 event. This is `node_memory_MemAvailable_bytes`: If 3G are available, why is there...
Systemd's ["under memory pressure" messages](https://github.com/systemd/systemd/blob/7c0afcdde22d3d94fd23bfd0e473c263aaf54e8a/docs/MEMORY_PRESSURE.md) seem to arise from the kernel's ["pressure stall information"](https://docs.kernel.org/accounting/psi.html). Chasing down that rabbit hole, prometheus has metrics: - `node_pressure_memory_stalled_seconds_total` - `node_pressure_memory_waiting_seconds_total` - `node_pressure_cpu_waiting_seconds_total` - `node_pressure_io_stalled_seconds_total`...
From the Nov 21 OOM killer log: ``` Node 0 active_anon:3,967,336kB inactive_anon:49,868kB active_file:220,732kB inactive_file:2551,532kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:126,344kB dirty:336kB writeback:0kB shmem:75,576kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:63,488kB writeback_tmp:0kB kernel_stack:6156kB pagetables:23,828kB sec_pagetables:0kB all_unreclaimable? yes...
`node_vmstat_pgmajfault` also does not show an increase during the Dec 8 event.
After reading more than I ever wanted about Linux memory management, I'm concluding that the machine may quite simply be out of memory. (Shocker!) I've got the machine configured for...
5GB is incorrect. What matters is the sum of the "normal" and the "DMA32" zone, which reflects the expected 8GiB.
As discussed a few times, while the lack of simplification is a potential problem, I think it's probably more useful to allow this to vectorize. There are two missing pieces...