earlyoom icon indicating copy to clipboard operation
earlyoom copied to clipboard

earlyoom kills wrong processes first due to oom_score_adj

Open vinc17fr opened this issue 5 months ago • 25 comments

I'm using earlyoom under Debian/unstable (upcoming Debian 13), mainly due to a bug in Firefox on YouTube, with the consequence that the associated firefox process suddenly takes more and more memory until OOM. This has happened again, where this process took more than 3 GB VmRSS memory. But earlyoom first killed other processes instead, in particular the important pipewire-pulse process:

Aug 04 23:41:26 qaa earlyoom[1308]: mem avail:  2051 of 20537 MiB ( 9.99%), swap free:    0 of  975 MiB ( 0.01%)
Aug 04 23:41:26 qaa earlyoom[1308]: low memory! at or below SIGTERM limits: mem 10.00%, swap 10.00%
Aug 04 23:41:26 qaa earlyoom[1308]: sending SIGTERM to process 1024668 uid 1000 "Web Content": oom_score 823, VmRSS 79 MiB, cmdline "/usr/lib/firefox/firefox -contentproc -isForBrowser -prefsHandle 0:44331 -prefMapHandle 1:283567 -jsInitHandle 2:242012 -parentBuildID 20250717180000 -sandboxReporter 3 -ipcHandle 4 -initialChannelId {af7f00c7-3771-49ed-8ae7-972b131e8b32} -parentPid 46 -"
Aug 04 23:41:26 qaa earlyoom[1308]: kill_release: pid=1024668: process_mrelease pidfd=4 success
Aug 04 23:41:26 qaa earlyoom[1308]: process 1024668 exited after 0.100 seconds
Aug 04 23:41:28 qaa earlyoom[1308]: mem avail:  2054 of 20543 MiB (10.00%), swap free:    0 of  975 MiB ( 0.01%)
Aug 04 23:41:28 qaa earlyoom[1308]: low memory! at or below SIGTERM limits: mem 10.00%, swap 10.00%
Aug 04 23:41:28 qaa earlyoom[1308]: sending SIGTERM to process 1026163 uid 1000 "Web Content": oom_score 823, VmRSS 79 MiB, cmdline "/usr/lib/firefox/firefox -contentproc -isForBrowser -prefsHandle 0:44331 -prefMapHandle 1:283567 -jsInitHandle 2:242012 -parentBuildID 20250717180000 -sandboxReporter 3 -ipcHandle 4 -initialChannelId {6c109649-b80d-4abc-8d7c-54085db3ad4e} -parentPid 46 -"
Aug 04 23:41:28 qaa earlyoom[1308]: kill_release: pid=1026163: process_mrelease pidfd=4 success
Aug 04 23:41:28 qaa earlyoom[1308]: process 1026163 exited after 0.100 seconds
Aug 04 23:41:28 qaa earlyoom[1308]: mem avail:  2034 of 20540 MiB ( 9.91%), swap free:    0 of  975 MiB ( 0.01%)
Aug 04 23:41:28 qaa earlyoom[1308]: low memory! at or below SIGTERM limits: mem 10.00%, swap 10.00%
Aug 04 23:41:28 qaa earlyoom[1308]: sending SIGTERM to process 1025338 uid 1000 "Web Content": oom_score 823, VmRSS 78 MiB, cmdline "/usr/lib/firefox/firefox -contentproc -isForBrowser -prefsHandle 0:44331 -prefMapHandle 1:283567 -jsInitHandle 2:242012 -parentBuildID 20250717180000 -sandboxReporter 3 -ipcHandle 4 -initialChannelId {95143ef1-8f9d-4a88-8da6-ee03ca80d612} -parentPid 46 -"
Aug 04 23:41:28 qaa earlyoom[1308]: kill_release: pid=1025338: process_mrelease pidfd=4 success
Aug 04 23:41:28 qaa earlyoom[1308]: process 1025338 exited after 0.100 seconds
Aug 04 23:41:28 qaa earlyoom[1308]: mem avail:  2036 of 20537 MiB ( 9.92%), swap free:    0 of  975 MiB ( 0.01%)
Aug 04 23:41:28 qaa earlyoom[1308]: low memory! at or below SIGTERM limits: mem 10.00%, swap 10.00%
Aug 04 23:41:28 qaa earlyoom[1308]: sending SIGTERM to process 2002 uid 1000 "pipewire-pulse": oom_score 800, VmRSS 40 MiB, cmdline "/usr/bin/pipewire-pulse"
Aug 04 23:41:28 qaa earlyoom[1308]: kill_release: pid=2002: process_mrelease pidfd=4 failed: Invalid argument
Aug 04 23:41:28 qaa systemd[1981]: pipewire-pulse.service: Consumed 17min 43.690s CPU time, 54.2M memory peak, 100K memory swap peak.
Aug 04 23:41:28 qaa earlyoom[1308]: process 2002 exited after 0.100 seconds
Aug 04 23:41:35 qaa earlyoom[1308]: mem avail:  2050 of 20519 MiB ( 9.99%), swap free:    0 of  975 MiB ( 0.02%)
Aug 04 23:41:35 qaa earlyoom[1308]: low memory! at or below SIGTERM limits: mem 10.00%, swap 10.00%
Aug 04 23:41:35 qaa earlyoom[1308]: sending SIGTERM to process 5446 uid 1000 "Isolated Web Co": oom_score 800, VmRSS 3257 MiB, cmdline "/usr/lib/firefox/firefox -contentproc -isForBrowser -prefsHandle 0:34285 -prefMapHandle 1:283567 -jsInitHandle 2:242012 -parentBuildID 20250717180000 -sandboxReporter 3 -ipcHandle 4 -initialChannelId {4ac59450-5248-4a8c-83ff-bced872d4463} -parentPid 46 -"
Aug 04 23:41:35 qaa earlyoom[1308]: kill_release: pid=5446: process_mrelease pidfd=4 success
Aug 04 23:41:35 qaa earlyoom[1308]: process 5446 exited after 0.100 seconds

Note: I'm using Debian's default options for the earlyoom process, which is

UID          PID    PPID  C STIME TTY          TIME CMD
earlyoom    1308       1  0 Jul28 ?        00:01:01 /usr/bin/earlyoom -r 3600

vinc17fr avatar Aug 04 '25 22:08 vinc17fr

By the way, these processes have a high oom_score. Are you sure they’re the wrong processes?

Also: Web Content processes have hight oom_score_adj by default

hakavlad avatar Aug 05 '25 10:08 hakavlad

this process took more than 3 GB VmRSS memory. But earlyoom first killed other processes instead

Maybe this process has an oom_score_adj of zero, unlike the processes that were killed.

hakavlad avatar Aug 05 '25 11:08 hakavlad

Maybe this process has an oom_score_adj of zero, unlike the processes that were killed.

I don't see why this would be the case. I think that earlyoom should also output the oom_score_adj value. It would be easier to see what's going wrong. With the following command under zsh, I cannot see any "Isolated Web Co" process with an oom_score_adj score of zero.

for i in "${(f)$(ps -eo 'pid,command' c)}" ; do a=(${=i}); printf "%6s %s\n" "$(cat /proc/$a[1]/oom_score_adj 2>/dev/null)" "$a[2,-1]"; done

vinc17fr avatar Aug 05 '25 12:08 vinc17fr

Both browsers set high oom_score_adj for its working processes (tab groups).

$ oom-sort -l0
oom_score oom_score_adj   UID     PID Name            VmRSS   VmSwap
--------- ------------- ----- ------- --------------- ------- --------
      870           300  1000 1729862 chromium          163 M      0 M 
      870           300  1000 1729953 chromium          155 M      0 M 
      870           300  1000 1729960 chromium          161 M      0 M 
      870           300  1000 1729986 chromium          187 M      0 M 
      869           300  1000 1729883 chromium          124 M      0 M 
      869           300  1000 1729905 chromium          141 M      0 M 
      869           300  1000 1729954 chromium          139 M      0 M 
      868           300  1000 1729861 chromium          108 M      0 M 
      868           300  1000 1729892 chromium          102 M      0 M 
      868           300  1000 1729938 chromium           86 M      0 M 
      868           300  1000 1730046 chromium           62 M      0 M 
      823           233  1000 1715829 Web Content        76 M      0 M 
      823           233  1000 1716255 Web Content        77 M      0 M 
      823           233  1000 1717467 Web Content        77 M      0 M 
      823           233  1000 1728777 Web Content        77 M      0 M 
      823           233  1000 1728929 Web Content        77 M      0 M 
      823           233  1000 1729553 Web Content        78 M      0 M 
      802           200  1000 1729806 chromium          102 M      0 M 
      802           200  1000 1729811 chromium          144 M      0 M 
      800           200  1000 1729816 chromium           42 M      0 M 
      794           167  1000 1715095 Isolated Web Co   756 M      0 M 
      786           167  1000 1714419 Isolated Web Co   372 M      0 M 
      783           167  1000 1724524 Isolated Web Co   235 M      0 M 
      783           167  1000 1724940 Isolated Web Co   258 M      0 M 
      782           167  1000 1714091 Isolated Web Co   195 M      0 M 
      782           167  1000 1714204 Isolated Web Co   233 M      0 M 
      782           167  1000 1723431 Isolated Web Co   219 M      0 M 
      781           167  1000 1714050 Privileged Cont   156 M      0 M 
      781           167  1000 1714177 Isolated Web Co   170 M      0 M 
      781           167  1000 1714378 Privileged Cont   156 M      0 M 
      781           167  1000 1714415 Isolated Web Co   161 M      0 M 
      781           167  1000 1714476 Isolated Web Co   175 M      0 M 
      780           167  1000 1714110 Isolated Web Co   125 M      0 M 
      780           167  1000 1714125 Isolated Web Co   124 M      0 M 
      780           167  1000 1714456 Isolated Web Co   119 M      0 M 
      780           167  1000 1715372 Isolated Web Co    97 M      0 M 
      780           167  1000 1716566 Isolated Web Co   124 M      0 M 
      780           167  1000 1719624 Isolated Web Co   126 M      0 M 
      780           167  1000 1719723 Isolated Web Co   111 M      0 M 
      780           167  1000 1723509 Isolated Web Co   100 M      0 M 
      780           167  1000 1723547 Isolated Web Co   109 M      0 M 
      780           167  1000 1724172 Isolated Web Co   102 M      0 M 
      745           100  1000 1714439 Isolated Web Co   530 M      0 M 
      738           100  1000 1714087 Isolated Web Co   210 M      0 M 
      737           100  1000 1714982 WebExtensions     178 M      0 M 
      736           100  1000 1714463 Isolated Web Co   123 M      0 M 
      736           100  1000 1714500 Isolated Web Co   135 M      0 M 
      736           100  1000 1714637 WebExtensions     158 M      0 M 
      692             0  1000 1713799 firefox-esr      1154 M      0 M 
      676             0  1000 1713972 firefox-esr       432 M      0 M 
      672             0  1000 1729757 chromium          254 M      0 M 
      669             0     0 1713139 Xorg              131 M      0 M 

hakavlad avatar Aug 05 '25 12:08 hakavlad

Perhaps it was OK to kill the "Web Content" processes if they were not important. Perhaps the bug is that pipewire-pulse has an oom_score_adj value 200 here while it is an important process: when it is killed, the audio stops.

That said, oom_score_adj seems broken, at least for earlyoom, whose goal is to kill the largest process. In my case, the culprit tab was taking 3257 MiB while pipewire-pulse was just taking 40 MiB. This is a huge difference, and oom_score_adj should have had a very little influence here.

vinc17fr avatar Aug 05 '25 12:08 vinc17fr

Perhaps the bug is that pipewire-pulse has an oom_score_adj value 200 here while it is an important process

Yes, it's weird. That's not earlyoom's problem. There is no reason to set positive oom_core_adj for pipewire-pulse.

Maybe you should report here https://gitlab.freedesktop.org/pipewire/pipewire/-/issues

hakavlad avatar Aug 05 '25 13:08 hakavlad

AI:

It’s a bug in the systemd service configuration for pipewire-pulse

By default, the pipewire-pulse unit ships with oom_score_adj=200, which actually increases its chance of being killed under OOM conditions instead of protecting it. Why this is incorrect

• A positive oom_score_adj value raises the process’s kill priority when memory is low. • Critical services like the audio subsystem should use zero or a negative value to avoid being targeted first. Where to report the issue

In your distribution’s bug tracker
• Identify the package (e.g., pipewire or pipewire-pulse).
• Describe the misconfigured oom_score_adj=200 and recommend setting it to 0 or a negative value.
On the PipeWire upstream tracker
• GitLab: https://gitlab.freedesktop.org/pipewire/pipewire/-/issues
• Create a “New issue” and include the output of systemctl cat pipewire-pulse.service.
Via the PipeWire developers’ mailing list
• Email: [[email protected]](mailto:[email protected])

Temporary workaround

Create a systemd override to reset the score:

sudo systemctl edit pipewire-pulse.service
# Add under [Service]:
#   oom_score_adj=0
sudo systemctl daemon-reload
sudo systemctl restart pipewire-pulse

hakavlad avatar Aug 05 '25 13:08 hakavlad

AI:

It’s a bug in the systemd service configuration for pipewire-pulse

By default, the pipewire-pulse unit ships with oom_score_adj=200, [...]

This seems to be incorrect:

qaa% grep -r oom_score_adj /usr/lib/systemd /etc/systemd
grep: /usr/lib/systemd/systemd: binary file matches
grep: /usr/lib/systemd/systemd-executor: binary file matches

And there's no oom_score_adj either in the pipewire Debian source package (from which pipewire-pulse is generated).

vinc17fr avatar Aug 05 '25 14:08 vinc17fr

These two having the same oom_score is bizzare

Aug 04 23:41:28 qaa earlyoom[1308]: sending SIGTERM to process 2002 uid 1000 "pipewire-pulse": oom_score 800, VmRSS 40 MiB, cmdline "/usr/bin/pipewire-pulse"
Aug 04 23:41:35 qaa earlyoom[1308]: sending SIGTERM to process 5446 uid 1000 "Isolated Web Co": oom_score 800, VmRSS 3257 MiB, cmdline "/usr/lib/firefox/firefox -contentproc -isForBrowser -prefsHandle 0:34285 -prefMapHandle 1:283567 -jsInitHandle 2:242012 -parentBuildID 20250717180000 -sandboxReporter 3 -ipcHandle 4 -initialChannelId {4ac59450-5248-4a8c-83ff-bced872d4463} -parentPid 46 -"

I think that earlyoom should also output the oom_score_adj value

Ack. In the meantime:

  1. Can you check what oom_score_adj is actually set to for the pipewire-pulse process? This should do it:

    cat /proc/$(pgrep pipewire-pulse)/oom_score_adj

  2. Add --sort-by-rss to /etc/default/earlyoom

rfjakob avatar Aug 05 '25 20:08 rfjakob

  1. Can you check what oom_score_adj is actually set to for the pipewire-pulse process? This should do it: cat /proc/$(pgrep pipewire-pulse)/oom_score_adj
qaa% cat /proc/$(pgrep pipewire-pulse)/oom_score_adj
200

FYI, all the following processes have a value of 200:

   200 dbus-daemon
   200 pipewire
   200 pipewire
   200 wireplumber
   200 mpris-proxy
   200 xdg-permission-
   200 gvfsd
   200 gvfsd-fuse
   200 gpg-agent
   200 at-spi-bus-laun
   200 dbus-daemon
   200 at-spi2-registr
   200 xdg-desktop-por
   200 xdg-document-po
   200 fusermount3
   200 xdg-desktop-por
   200 dconf-service
   200 notification-da
   200 pipewire-pulse

They are the descendants of the same process:

  vinc17 ├─> 1981  systemd --user --deserialize=34
  vinc17 │ ├─> 1983  (sd-pam)
  vinc17 │ ├─> 1997  dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
  vinc17 │ ├─> 1998  pipewire
  vinc17 │ ├─> 1999  pipewire -c filter-chain.conf
  vinc17 │ ├─> 2001  wireplumber
  vinc17 │ ├─> 2003  mpris-proxy
  vinc17 │ ├─> 2126  xdg-permission-store
  vinc17 │ ├─> 3486  gvfsd
  vinc17 │ ├─> 3492  gvfsd-fuse gvfs -f
  vinc17 │ ├─> 3827  gpg-agent --supervised
  vinc17 │ ├─> 3833  at-spi-bus-launcher
  vinc17 │ │ └─> 3840  dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 11 --address=unix:path=/run/user/1000/at-spi/bus_0
  vinc17 │ ├─> 3848  at-spi2-registryd --use-gnome-session
  vinc17 │ ├─> 92214  xdg-desktop-portal
  vinc17 │ ├─> 92243  xdg-document-portal
    root │ │ └─> 92250  fusermount3 -o rw,nosuid,nodev,fsname=portal,auto_unmount,subtype=portal -- doc
  vinc17 │ ├─> 92258  xdg-desktop-portal-gtk
  vinc17 │ ├─> 152596  dconf-service
  vinc17 │ ├─> 339845  notification-daemon
  vinc17 │ └─> 1028192  pipewire-pulse

So it could be a systemd issue.

  1. Add --sort-by-rss to /etc/default/earlyoom

OK.

vinc17fr avatar Aug 05 '25 23:08 vinc17fr

The systemd --user --deserialize=34 process itself has an oom_score_adj value of 100. This could be due to the OOMScoreAdjust=100 line in /usr/lib/systemd/system/[email protected]. But I did not find anything that could yield 200 for its descendants.

vinc17fr avatar Aug 06 '25 00:08 vinc17fr

The systemd-system.conf(5) man page says:

DefaultOOMScoreAdjust= [...] This defaults to unset (meaning the forked off processes inherit the service manager's OOM score adjustment value), except if the service manager is run for an unprivileged user, in which case this defaults to the service manager's OOM adjustment value plus 100 (this makes service processes slightly more likely to be killed under memory pressure than the manager itself). [...]

It seems that this "plus 100" is the explanation.

But all these systemd DefaultOOMScoreAdjust settings makes no sense in a context where not all the processes are started by systemd.

vinc17fr avatar Aug 06 '25 00:08 vinc17fr

Concerning Firefox in particular, it was supposed to take into account the systemd settings (mentioning the 100+100), but this proposed solution assumes that the initial oom_score_adj value for Firefox has already been increased (while it is 0 here), so that this does not work. I've posted I comment to a related Firefox bug.

vinc17fr avatar Aug 06 '25 01:08 vinc17fr

So firefox has 100 while pipewire-pulse has 200? Did i get this right?

rfjakob avatar Aug 06 '25 07:08 rfjakob

So firefox has 100 while pipewire-pulse has 200? Did i get this right?

Almost. The Firefox processes have various oom_score_adj values. Those that correspond to tabs ("Isolated Web Co") have value 100 or 167 (so, less than 200 in all cases). The user systemd services, like pipewire-pulse, have 200.

vinc17fr avatar Aug 06 '25 09:08 vinc17fr

I'm gonna call this a bug in Firefox (or a Linux distro integration problem).

Does --sort-by-rss work for you?

rfjakob avatar Aug 21 '25 19:08 rfjakob

I don't think that this is a bug in Firefox: the same issue might occur more likely if some other non-daemon process takes much memory, as the default oom_score_adj value is 0. Currently, I see this as at least a bug in systemd.

That said, I'm not sure that earlyoom should take the oom_score into account: this is what the OOM killer seems to do, and precisely what should be avoided (because the oom_score implementation is broken). The earlyoom(1) man page says: "it will kill the largest process (highest oom_score)". This is wrong. The largest process is basically given by the RSS (that's a rather good approximation). The oom_score has not much to do with the size of the process: due to the oom_score_adj values at least, a process with a large oom_score may actually be very small.

AFAIK, I've had only one crash with Firefox since I've been using --sort-by-rss, and this seemed to work here. I've also tried with memhog, and this worked too, but it had a large oom_score (918), so this is not a conclusive example as I suppose that it would have also been killed first without using --sort-by-rss.

vinc17fr avatar Aug 25 '25 00:08 vinc17fr

I see it like this: firefox wants to increase oom_score_adj. But under a corner case, it ends up decreasing it instead. That's firefox' failure to handle the corner case.

I don't know if you can even call it a corner case. It wants to increase a value, and instead of reading what it is, increasing it, and writing it back it just blindly overwrites it.

rfjakob avatar Aug 25 '25 07:08 rfjakob

No, Firefox really increases the oom_score_adj: the default oom_score_adj is 0. And values set by Firefox are currently 100, 167, 233 (the 233, for Web Content processes, is new for me). This means that the initial order of preference to kill processes is: Web Content processes, then daemons, then Isolated Web Co processes, then normal user processes.

vinc17fr avatar Aug 25 '25 08:08 vinc17fr

I recall that without --sort-by-rss, earlyoom preferred to kill a daemon that was taking only 40 MB (VmRSS, with a 54.2 MB memory peak) rather than a process that was taking 3257 MB, i.e. 80 times more memory! That's why oom_score_adj does not make much sense. I think that the oom_score could be considered when choosing between several processes that have about the same size (say, within a factor 2 or 3), but not when there is a big size difference.

vinc17fr avatar Aug 25 '25 08:08 vinc17fr

No, firefox does not increase the value. It overwrites it. Which may increase or decrease the value. Mozilla bug 1787638 is about fixing this.

On Mon, 25 Aug 2025, 10:46 Vincent Lefèvre, @.***> wrote:

vinc17fr left a comment (rfjakob/earlyoom#344) https://github.com/rfjakob/earlyoom/issues/344#issuecomment-3219372755

I recall that without --sort-by-rss, earlyoom preferred to kill a daemon that was taking only 40 MB (VmRSS, with a 54.2 MB memory peak) rather than a process that was taking 3257 MB, i.e. 80 times more memory! That's why oom_score_adj does not make much sense. I think that the oom_score could be considered when choosing between several processes that have about the same size (say, within a factor 2 or 3), but not when there is a big size difference.

— Reply to this email directly, view it on GitHub https://github.com/rfjakob/earlyoom/issues/344#issuecomment-3219372755, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACGA72URK426QIFD6SO2ND3PLENJAVCNFSM6AAAAACDDJGA4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEMJZGM3TENZVGU . You are receiving this because you commented.Message ID: @.***>

rfjakob avatar Aug 25 '25 09:08 rfjakob

Yes this is stupid. However, the kernel oom killer will still follow what oom_score_adj tells it to do. So making earlyoom ignore oom_score_adj only fixes part of the problem.

On Mon, 25 Aug 2025, 10:46 Vincent Lefèvre, @.***> wrote:

vinc17fr left a comment (rfjakob/earlyoom#344) https://github.com/rfjakob/earlyoom/issues/344#issuecomment-3219372755

I recall that without --sort-by-rss, earlyoom preferred to kill a daemon that was taking only 40 MB (VmRSS, with a 54.2 MB memory peak) rather than a process that was taking 3257 MB, i.e. 80 times more memory! That's why oom_score_adj does not make much sense. I think that the oom_score could be considered when choosing between several processes that have about the same size (say, within a factor 2 or 3), but not when there is a big size difference.

— Reply to this email directly, view it on GitHub https://github.com/rfjakob/earlyoom/issues/344#issuecomment-3219372755, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACGA72URK426QIFD6SO2ND3PLENJAVCNFSM6AAAAACDDJGA4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEMJZGM3TENZVGU . You are receiving this because you commented.Message ID: @.***>

rfjakob avatar Aug 25 '25 09:08 rfjakob

No, firefox does not increase the value. It overwrites it. Which may increase or decrease the value. Mozilla bug 1787638 is about fixing this.

The change suggested from Mozilla bug 1787638 would not have any effect on machines like mine, where the default oom_score_adj value is 0. So this cannot be regarded as a fix for the problem I've reported. A "fix" would be needed on the systemd side.

Yes this is stupid. However, the kernel oom killer will still follow what oom_score_adj tells it to do. So making earlyoom ignore oom_score_adj only fixes part of the problem.

The whole point is that earlyoom takes action before the OOM killer is involved.

vinc17fr avatar Aug 25 '25 09:08 vinc17fr

I happened to stumble upon this, and wanted to share a counter-argument (take it as an opinion, just as a plain user):

I recall that without --sort-by-rss, earlyoom preferred to kill a daemon that was taking only 40 MB (VmRSS, with a 54.2 MB memory peak) rather than a process that was taking 3257 MB, i.e. 80 times more memory! That's why oom_score_adj does not make much sense. I think that the oom_score could be considered when choosing between several processes that have about the same size (say, within a factor 2 or 3), but not when there is a big size difference.

In this example it might be clear to kill the browser, if the browser process happens to be doing something more or less stateless (so there's no waste of time / productivity, other than reopening the page).

This might be different in cases where the browser is doing something more critical or where work is lost if the browser gets killed; or in cases where the large-process is something like a video editor or some scientific processing that needs to use lots of memory and needs to process data for a long time.

The situation is also different if the small jobs are not so small (e.g. "only" 400MB instead of several GBs of the largest process), or if there are several/lots of them and are relatively unimportant (e.g. jobs retrieving data, that will be retried if killed). For example one might launch 10 such jobs in parallel at a time, with highest oom_score_adj scores, to not affect the rest of the system if they get to use too much mem / cause OOM in the system.

So, in summary, in this example case the situation might appear to be clear, but these heuristics get complicated and, in general, I think that it's wise to respect the widely known oom_score_adj, because people do make use of it, it's an interface that has been there for a long time, and people expect the system to honor the setting if they bother to make use oom_score_adjust -- even if sometimes the results are less-than-optimal.

I also think that earlyoom's --sort-by-rss might be useful in cases like these, if one doesn't want to bother with these scores and really wants the largest offender to get killed. Or firefox / browser processes setting a higher score by default in the distro settings.

manuelafm avatar Sep 15 '25 17:09 manuelafm

The issue is that systemd gives a high oom_score_adj score for processes that should not be killed.

There are alternatives for earlyoom. Perhaps it should ignore oom_score for small processes for which oom_score_adj ⩽ 200 (which is what systemd uses for daemons). Users who want to run unimportant jobs should use a value higher than 200.

vinc17fr avatar Sep 17 '25 10:09 vinc17fr