Invalid argument to `process_mrelease`
I recently encountered an oom situation in which earlyoom itself failed, requiring a poweroff to fix. System logs prior to the poweroff reveal that earlyoom invoked process_mrelease incorrectly. I've omitted the command line of the process due to its length, but I wouldn't expect it to cause the issue:
Nov 12 17:08:27 earlyoom[1295]: mem avail: 888 of 55597 MiB ( 1.60%), swap free: 0 of 8191 MiB ( 0.00%)
Nov 12 17:08:27 earlyoom[1295]: low memory! at or below SIGTERM limits: mem 1.66%, swap 10.00%
Nov 12 17:08:27 earlyoom[1295]: sending SIGTERM to process 906175 uid 1000 "java": oom_score 1170, VmRSS 43912 MiB, cmdline "..."
Nov 12 17:08:27 earlyoom[1295]: kill_release: pid=906175: process_mrelease pidfd=4 failed: Invalid argument
Nov 12 17:08:37 earlyoom[1295]: process 906175 did not exit
Nov 12 17:08:37 earlyoom[1295]: kill failed: Timer expired
Nov 12 17:08:37 earlyoom[1295]: mem avail: 896 of 55599 MiB ( 1.61%), swap free: 0 of 8191 MiB ( 0.00%)
Nov 12 17:08:37 earlyoom[1295]: low memory! at or below SIGTERM limits: mem 1.66%, swap 10.00%
Nov 12 17:08:37 earlyoom[1295]: sending SIGTERM to process 906175 uid 1000 "java": oom_score 1170, VmRSS 43912 MiB, cmdline "..."
Nov 12 17:08:37 earlyoom[1295]: kill_release: pid=906175: process_mrelease pidfd=4 failed: Invalid argument
Nov 12 17:08:47 earlyoom[1295]: process 906175 did not exit
Nov 12 17:08:47 earlyoom[1295]: kill failed: Timer expired
Nov 12 17:08:47 earlyoom[1295]: mem avail: 892 of 55595 MiB ( 1.61%), swap free: 0 of 8191 MiB ( 0.00%)
Nov 12 17:08:47 earlyoom[1295]: low memory! at or below SIGTERM limits: mem 1.66%, swap 10.00%
Nov 12 17:08:47 earlyoom[1295]: sending SIGTERM to process 906175 uid 1000 "java": oom_score 1170, VmRSS 43912 MiB, cmdline "..."
Nov 12 17:08:47 earlyoom[1295]: kill_release: pid=906175: process_mrelease pidfd=4 failed: Invalid argument
Nov 12 17:08:57 earlyoom[1295]: process 906175 did not exit
Nov 12 17:08:57 earlyoom[1295]: kill failed: Timer expired
Hi, no, what seems to be happening here is that your "java" process ignores SIGTERM.
As a consequence, process_mrelease also fails (cannot release the memory of a process that's not exiting).
But earlyoom will escalate to SIGKILL when the available memory drops even lower, and SIGKILL cannot be ignored.
What limits do you use? Did you disable SIGKILL?
Can you post the earlyoom command line? I.e.
ps auxwww | grep earlyoom
or so?
The options passed to earlyoom are:
-r 30
-m 4
-M 1048576
--ignore-root-user
-n
--prefer and --avoid are the defaults, ommited for clarity.
Also, here is the initial output from journalctl:
mem total: 61815 MiB, user mem total: 60482 MiB, swap total: 8191 MiB
sending SIGTERM when mem avail <= 1.66% and swap free <= 10.00%,
SIGKILL when mem avail <= 0.83% and swap free <= 5.00%
As a consequence, process_mrelease also fails (cannot release the memory of a process that's not exiting).
I'd think a small error wrapper would be helpful (e.g. Process ignored SIGTERM or the like) since then the system logs would be easier to understand.
I've changed -M to 1572864,1048576 hopefully that helps.