Hyprland
Hyprland copied to clipboard
Enable LTO in supported compilers
Describe your PR, what does it fix/add?
Enabled LTO, because why not?
Is there anything you want to mention? (unchecked code, possible bugs, found problems, breaking compatibility, etc.)
No
Is it ready for merging, or does it need work?
It's ready
Benefits of LTO
LTO can give double digit performance boosts for many programs.
Can lower RAM usage per program making it very useful for limited memory systems.
Downsides of LTO
Can increase compile time by 2 to 3 times.
Uses more RAM during compiling.
Not all programs become faster or smaller.
There is an increased chance of finding build-time or runtime bugs while using it.
Always be prepared to try without it if something is acting odd.
Some stats on my machine (test before the patch)
cmake -G Ninja -B build/ -DCMAKE_BUILD_TYPE=Release -DCMAKE_INTERPROCEDURAL_OPTIMIZATION=ON/OFF
GCC (-flto=auto):
LTO ON: cmake --build build/ --clean-first 298.82s user 32.47s system 1880% cpu 17.613 total
LTO OFF: cmake --build build --clean-first 507.19s user 31.09s system 2270% cpu 23.704 total
Clang(-flto=thin)
LTO ON: cmake --build build/ --clean-first 276.76s user 10.66s system 1997% cpu 14.391 total
LTO OFF: cmake --build build --clean-first 308.75s user 10.49s system 2278% cpu 14.012 total
❯ clang --version
clang version 17.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
❯ gcc --version
gcc (GCC) 13.2.1 20240417
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE
❯ neofetch
-` codemonkey@workstation-01
.o+` -------------------------
`ooo/ OS: Arch Linux x86_64
`+oooo: Kernel: 6.8.9-1-cachyos-bore
`+oooooo: Uptime: 4 hours, 50 mins
-+oooooo+: Packages: 2471 (pacman), 6 (flatpak)
`/:-:++oooo+: Shell: zsh 5.9
`/++++/+++++++: Resolution: 2560x1440
`/++++++++++++++: DE: Hyprland
`/+++ooooooooooooo/` WM: sway
./ooosssso++osssssso+` Theme: Adwaita [GTK2], Adwaita-dark [GTK3]
.oossssso-````/ossssss+` Icons: Adwaita [GTK2/3]
-osssssso. :ssssssso. Terminal: vscode
:osssssss/ osssso+++. CPU: AMD Ryzen 9 7950X (32) @ 5.881GHz
/ossssssss/ +ssssooo/- GPU: AMD ATI Radeon RX 7900 XT/7900 XTX/7900M
`/ossssso+/:- -:/+osssso+- Memory: 19065MiB / 63999MiB
`+sso+:-` `.-/+oso:
`++:. `-/+/
.` `/
Obviously this support is still experimental, but nice addition to have. Thoughts? @vaxerski
looking at the drawbacks, I'm not convinced this is a good idea.
I do agree that this should not be enabled by default :) But if the user is adventurous enough to try :)
Well I'm not sure, where it's mentioned that LTO is experimental. Both GCC and Clang claim it's mature. Maybe it was experimental a few years ago, but it's not currently. Chromium, Firefox and many more much complex and bigger projects use it.
@vaxerski Is there a good CPU bottleneck benchmark I can use to compare LTO and non-LTO builds?
no clue, I've never used lto
@In-line can you provide a "patch" for meson based building too? I'll try to build and test it on Nix.
@JohnRTitor you can rebase this PR on top of #5667 to test. I'm going to merge that soon.
I am not the PR author this time :) @In-line well, you heard fufexan :)
I still think this should be left as an "option", the compile times will vary due to hardware and feature sets.
Compilation time table for me:
| LTO | OFF | ON | |
|---|---|---|---|
| real | 0m55.257s | 0m38.746s | -30% |
| user | 12m28.347s | 7m28.746s | -40% |
| sys | 0m20.171s | 0m24.944s | +25% |
I am not the PR author this time :) @In-line well, you heard fufexan :)
I meant more as: clone repo, gh pr checkout 5874, checkout cmake, git rebase In-line:lto.
But the CMake PR is now merged, so a simple rebase should get you up and running.
What starship reports in my case: LTO on: 1m57s LTO off: 2m44s
GCC lto itself does not do much. Clang LTO, especially thin LTO is much better.
@vaxerski Is there a good CPU bottleneck benchmark I can use to compare LTO and non-LTO builds?
Maybe these are not what you are looking for, but can be helpful:
https://www.phoronix.com/review/clang-lto-kernel https://www.phoronix.com/review/clang-12-opt https://www.phoronix.com/review/gcc11-rocket-opts They are pretty outdated though.
Clang LTO: Finished at 20:34:56 after 1m3s GCC LTO: Finished at 20:28:26 after 1m16s
Hyprland isn't that big to be bottlenecked by CPU compilation time on modern systems. I don't think compilation time is the metric that has noticeable regression for us.
I meant CPU bottleneck benchmarks for Hyprland to see how much difference it brings in weak systems with iGPUs, where bottleneck might be on CPU side. As LTO is performance optimization, it should decrease Hyprland executable size and increase it's execution speed.
I was asking for any benchmarks I can run on slow GPU to test improvements that come with LTO.
@JohnRTitor Patches for Meson are ready
I don't know if this is a good idea either, even more so if we don't benchmark it at least and see if there is meaningful improvement. Has anyone tried something to get Hyprland to lag and compare with and without? Maybe a stress test would be a neat idea if someone would like to work on that if it doesn't already exist, also could prove to be useful in improving performance in general without compiler flags if we can profile it. I have compiled my whole system with Gentoo in the past with LTO and NodeJS was the only thing that caused issues so it's somewhat stable I guess but likely still not good idea. But I imagine you might get bigger gains doing -O3 or -march=native latter wouldn't be practical of course always. Maybe this could be added as like a build option for those who want it to be faster and don't mind possible bugs? But would have to check if it actually is or not, sometimes can make things slower
I think all this conversations about some abstract rick in enabling LTO are pointless. As Hyprland is included in ALHP project already https://status.alhp.dev/?pkgbase=hyprland
I don't understand what all the "risk" fuss is about to be fair.
So are there any actual requirements for being included in ALHP, other than: "it builds, ship it"? I imagine getting this endorsed here officially will take a bit more than that.
I don't think anything speaks against using LTO on modern Linux.
openSuSE for example has been using Link Time Optimization for their entire repos since 2019 and there are no issues what so ever. Some other distros like Arch Linux and CachyOS also enable LTO for all packages.
Personally I've been running LTOd Hyprland across openSuSE TW and Gentoo for about 4 months at this point and I've never encountered anything strange.
To me personally enabling LTO is a no-brainer as long as it doesn't cause build issues (which it doesn't for Hyprland).
I'm noticing increased memory consumption (328M vs 190M), but I haven't rebooted, so I don't know how it fares on a clean environment.
Could you try rebooting and check it when the session is idle?
I am not sure how nixos-rebuild switch performs switch of display managers/DEs without restarting them.
Could you try rebooting and check it when the session is idle?
Will do later, as I'm working on something right now.
I am not sure how
nixos-rebuild switchperforms switch of display managers/DEs without restarting them.
It doesn't, as the service is already running. I've simply rebased this PR onto master and built it in the Hyprland repo, then launched the binary from tty.
This is with Clang LTO on.
I'm noticing increased memory consumption (328M vs 190M), but I haven't rebooted, so I don't know how it fares on a clean environment.
Are you sure that's not https://github.com/hyprwm/Hyprland/issues/6459?
I'm noticing increased memory consumption (328M vs 190M), but I haven't rebooted, so I don't know how it fares on a clean environment.
Are you sure that's not #6459?
No, it could be that.
I decided to run some basic tests on this. The following numbers were taken after a clean reboot, spawning some windows, and resizing them a bit to simulate usage.
Test system is:
Kernel: 6.9.4
Distro: Gentoo
Hyprland: 0.41.1
GCC: 13.2.1
Clang: 17.0.6
libc: glibc 2.38
gcc:
size: 7971152B
mem:
153.824MB
154.352MB
150.5MB
147.105MB
146.73MB
---------
150.502MB
gcc+lto:
size: 6504656B (-18.4%)
mem:
155.539MB
147.594MB
145.941MB
161.391MB
145.891MB
---------
151.271MB (+0.51%)
clang:
size: 7489688B
mem:
162.188MB
154.961MB
154.352MB
147.148MB
150.805MB
---------
153.891MB
clang+lto:
size: 6711216B (-10.4%)
mem:
156.711MB
149.492MB
154.074MB
153.367MB
156.953MB
---------
154.119MB (+0.15%)
So, based on this, the binaries are a decent chunk smaller and there isn't any obvious memory usage regression. But these are just preliminary tests. At the time of measurement, the system was up for only ~1min. I would be interested in seeing what effect it has on CPU usage (if any), but I think that's a bit trickier to measure.
With my setup, enabling LTO with GCC, Hyprland crashes on startup, but if I compile with Clang+LTO, it's fine.