gentooLTO
gentooLTO copied to clipboard
Is -flto=jobserver worth using?
From the GCC manual:
You can also specify -flto=jobserver to use GNU make’s job server mode to determine the number of parallel jobs. This is useful when the Makefile calling GCC is already executing in parallel. You must prepend a ‘+’ to the command recipe in the parent Makefile for this to work. This option likely only works if MAKE is GNU make. Even without the option value, GCC tries to automatically detect a running GNU make’s job server.
Given the stupid amount of RAM LTO requires, using this option seems attractive, since it takes away tweaking the -flto
option for packages that link a bunch of programs at the same time (nodejs
for example).
Of course, this won't work with ninja (as of yet, see https://github.com/ninja-build/ninja/pull/1140).
But, will most packages work with this? Does anyone run it over a -flto=n
? As far as I'm aware, you need to prefix the linker command with +
in the Makefile for this to work, so I'm not sure if packages actually support this?
(Someone correct me if I am wrong)
When specifying jobserver, GCC is supposed to parallelize and spawn LTO jobs based on the MAKEOPTS you (or upstream) defined for building.
https://github.com/gcc-mirror/gcc/blob/releases/gcc-10.2.0/gcc/lto-wrapper.c#L1419
But if you are still running into memory problems even if you are lowering the number of jobs, that's because limiting the number of LTO jobs with jobserver doesn't seem to be working as intended.
https://github.com/gcc-mirror/gcc/blob/releases/gcc-10.2.0/gcc/lto/lto.c#L240
So the solution looks to be using --param max-lto-streaming-parallelism=n
to keep the OOM reaper away.
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-04/msg00433.html
If I know I need a lot of RAM, I use zram and create a big swap. Really chugs along though, probably would be better to prevent it from hitting swap in the first place. I personally use -flto
which I believe defaults to jobserver.
After using this option for a while, I've noticed two things:
- I'm still running out of memory running the WPA phase, which like you mentioned, doesn't support jobserver, and runs at maximum parallelism.
-
GCC will pick up the jobserver whenever available, regardless of your
-flto=n
setting.
So, if you run -flto=jobserver
, if the jobserver isn't available, you'll run at maximum parallelism for the WPA stage, and the equivalent of -flto=1
for the LTRANS stage.
If you run -flto=n
, and the jobserver isn't available, you'll run n
threads during WPA, and n
threads during the LTRANS stage.
Regardless of what option you use, if a jobserver is available, maximum parallelism will be used during WPA, and the jobserver will be queried during LTRANS.
In conclusion, there isn't any benefit to running -flto=jobserver
over running -flto=n
with a really small n
, and hoping there's packages that support the jobserver. In any case, to avoid running out of memory when a jobserver is available, use --param=lto-max-streaming-parallelism=n
with a similar n
value.
Would it be useful to mention this parameter in make.conf.lto
?
Update: GCC 11 still has this same behavior.