Hyprland icon indicating copy to clipboard operation
Hyprland copied to clipboard

Enable LTO in supported compilers

Open In-line opened this issue 1 year ago • 31 comments
trafficstars

Describe your PR, what does it fix/add?

Enabled LTO, because why not?

Is there anything you want to mention? (unchecked code, possible bugs, found problems, breaking compatibility, etc.)

No

Is it ready for merging, or does it need work?

It's ready

In-line avatar May 04 '24 13:05 In-line

Benefits of LTO

LTO can give double digit performance boosts for many programs.
Can lower RAM usage per program making it very useful for limited memory systems.

Downsides of LTO

Can increase compile time by 2 to 3 times.
Uses more RAM during compiling.
Not all programs become faster or smaller.
There is an increased chance of finding build-time or runtime bugs while using it.
Always be prepared to try without it if something is acting odd.

gentoo wiki

Agent00Ming avatar May 04 '24 13:05 Agent00Ming

Some stats on my machine (test before the patch) cmake -G Ninja -B build/ -DCMAKE_BUILD_TYPE=Release -DCMAKE_INTERPROCEDURAL_OPTIMIZATION=ON/OFF

GCC (-flto=auto): LTO ON: cmake --build build/ --clean-first 298.82s user 32.47s system 1880% cpu 17.613 total LTO OFF: cmake --build build --clean-first 507.19s user 31.09s system 2270% cpu 23.704 total

Clang(-flto=thin) LTO ON: cmake --build build/ --clean-first 276.76s user 10.66s system 1997% cpu 14.391 total LTO OFF: cmake --build build --clean-first 308.75s user 10.49s system 2278% cpu 14.012 total

❯ clang --version
clang version 17.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
❯ gcc --version                     
gcc (GCC) 13.2.1 20240417
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE
❯ neofetch                                
                   -`                    codemonkey@workstation-01 
                  .o+`                   ------------------------- 
                 `ooo/                   OS: Arch Linux x86_64 
                `+oooo:                  Kernel: 6.8.9-1-cachyos-bore 
               `+oooooo:                 Uptime: 4 hours, 50 mins 
               -+oooooo+:                Packages: 2471 (pacman), 6 (flatpak) 
             `/:-:++oooo+:               Shell: zsh 5.9 
            `/++++/+++++++:              Resolution: 2560x1440 
           `/++++++++++++++:             DE: Hyprland 
          `/+++ooooooooooooo/`           WM: sway 
         ./ooosssso++osssssso+`          Theme: Adwaita [GTK2], Adwaita-dark [GTK3] 
        .oossssso-````/ossssss+`         Icons: Adwaita [GTK2/3] 
       -osssssso.      :ssssssso.        Terminal: vscode 
      :osssssss/        osssso+++.       CPU: AMD Ryzen 9 7950X (32) @ 5.881GHz 
     /ossssssss/        +ssssooo/-       GPU: AMD ATI Radeon RX 7900 XT/7900 XTX/7900M 
   `/ossssso+/:-        -:/+osssso+-     Memory: 19065MiB / 63999MiB 
  `+sso+:-`                 `.-/+oso:
 `++:.                           `-/+/                           
 .`                                 `/                           

In-line avatar May 04 '24 13:05 In-line

Obviously this support is still experimental, but nice addition to have. Thoughts? @vaxerski

JohnRTitor avatar May 05 '24 00:05 JohnRTitor

looking at the drawbacks, I'm not convinced this is a good idea.

vaxerski avatar May 05 '24 01:05 vaxerski

I do agree that this should not be enabled by default :) But if the user is adventurous enough to try :)

JohnRTitor avatar May 05 '24 01:05 JohnRTitor

Well I'm not sure, where it's mentioned that LTO is experimental. Both GCC and Clang claim it's mature. Maybe it was experimental a few years ago, but it's not currently. Chromium, Firefox and many more much complex and bigger projects use it.

@vaxerski Is there a good CPU bottleneck benchmark I can use to compare LTO and non-LTO builds?

In-line avatar May 05 '24 07:05 In-line

no clue, I've never used lto

vaxerski avatar May 05 '24 12:05 vaxerski

@In-line can you provide a "patch" for meson based building too? I'll try to build and test it on Nix.

JohnRTitor avatar May 05 '24 12:05 JohnRTitor

@JohnRTitor you can rebase this PR on top of #5667 to test. I'm going to merge that soon.

fufexan avatar May 05 '24 12:05 fufexan

I am not the PR author this time :) @In-line well, you heard fufexan :)

JohnRTitor avatar May 05 '24 13:05 JohnRTitor

I still think this should be left as an "option", the compile times will vary due to hardware and feature sets.

Compilation time table for me:
LTO OFF ON
real 0m55.257s 0m38.746s -30%
user 12m28.347s 7m28.746s -40%
sys 0m20.171s 0m24.944s +25%

Agent00Ming avatar May 05 '24 13:05 Agent00Ming

I am not the PR author this time :) @In-line well, you heard fufexan :)

I meant more as: clone repo, gh pr checkout 5874, checkout cmake, git rebase In-line:lto.

But the CMake PR is now merged, so a simple rebase should get you up and running.

fufexan avatar May 05 '24 13:05 fufexan

What starship reports in my case: LTO on: 1m57s LTO off: 2m44s

fufexan avatar May 05 '24 14:05 fufexan

GCC lto itself does not do much. Clang LTO, especially thin LTO is much better.

JohnRTitor avatar May 05 '24 14:05 JohnRTitor

@vaxerski Is there a good CPU bottleneck benchmark I can use to compare LTO and non-LTO builds?

Maybe these are not what you are looking for, but can be helpful:

https://www.phoronix.com/review/clang-lto-kernel https://www.phoronix.com/review/clang-12-opt https://www.phoronix.com/review/gcc11-rocket-opts They are pretty outdated though.

JohnRTitor avatar May 05 '24 14:05 JohnRTitor

Clang LTO: Finished at 20:34:56 after 1m3s GCC LTO: Finished at 20:28:26 after 1m16s

JohnRTitor avatar May 05 '24 15:05 JohnRTitor

Hyprland isn't that big to be bottlenecked by CPU compilation time on modern systems. I don't think compilation time is the metric that has noticeable regression for us.

I meant CPU bottleneck benchmarks for Hyprland to see how much difference it brings in weak systems with iGPUs, where bottleneck might be on CPU side. As LTO is performance optimization, it should decrease Hyprland executable size and increase it's execution speed.

I was asking for any benchmarks I can run on slow GPU to test improvements that come with LTO.

In-line avatar May 05 '24 16:05 In-line

@JohnRTitor Patches for Meson are ready

In-line avatar May 05 '24 19:05 In-line

I don't know if this is a good idea either, even more so if we don't benchmark it at least and see if there is meaningful improvement. Has anyone tried something to get Hyprland to lag and compare with and without? Maybe a stress test would be a neat idea if someone would like to work on that if it doesn't already exist, also could prove to be useful in improving performance in general without compiler flags if we can profile it. I have compiled my whole system with Gentoo in the past with LTO and NodeJS was the only thing that caused issues so it's somewhat stable I guess but likely still not good idea. But I imagine you might get bigger gains doing -O3 or -march=native latter wouldn't be practical of course always. Maybe this could be added as like a build option for those who want it to be faster and don't mind possible bugs? But would have to check if it actually is or not, sometimes can make things slower

nonetrix avatar May 14 '24 23:05 nonetrix

I think all this conversations about some abstract rick in enabling LTO are pointless. As Hyprland is included in ALHP project already https://status.alhp.dev/?pkgbase=hyprland

I don't understand what all the "risk" fuss is about to be fair.

In-line avatar May 24 '24 21:05 In-line

So are there any actual requirements for being included in ALHP, other than: "it builds, ship it"? I imagine getting this endorsed here officially will take a bit more than that.

gnusenpai avatar May 25 '24 23:05 gnusenpai

I don't think anything speaks against using LTO on modern Linux.

openSuSE for example has been using Link Time Optimization for their entire repos since 2019 and there are no issues what so ever. Some other distros like Arch Linux and CachyOS also enable LTO for all packages.

Personally I've been running LTOd Hyprland across openSuSE TW and Gentoo for about 4 months at this point and I've never encountered anything strange.

To me personally enabling LTO is a no-brainer as long as it doesn't cause build issues (which it doesn't for Hyprland).

CNR0706 avatar Jun 13 '24 10:06 CNR0706

I'm noticing increased memory consumption (328M vs 190M), but I haven't rebooted, so I don't know how it fares on a clean environment.

fufexan avatar Jun 13 '24 12:06 fufexan

Could you try rebooting and check it when the session is idle?

I am not sure how nixos-rebuild switch performs switch of display managers/DEs without restarting them.

JohnRTitor avatar Jun 13 '24 12:06 JohnRTitor

Could you try rebooting and check it when the session is idle?

Will do later, as I'm working on something right now.

I am not sure how nixos-rebuild switch performs switch of display managers/DEs without restarting them.

It doesn't, as the service is already running. I've simply rebased this PR onto master and built it in the Hyprland repo, then launched the binary from tty.

fufexan avatar Jun 13 '24 12:06 fufexan

image

This is with Clang LTO on.

JohnRTitor avatar Jun 13 '24 13:06 JohnRTitor

I'm noticing increased memory consumption (328M vs 190M), but I haven't rebooted, so I don't know how it fares on a clean environment.

Are you sure that's not https://github.com/hyprwm/Hyprland/issues/6459?

ErrorNoInternet avatar Jun 13 '24 13:06 ErrorNoInternet

I'm noticing increased memory consumption (328M vs 190M), but I haven't rebooted, so I don't know how it fares on a clean environment.

Are you sure that's not #6459?

No, it could be that.

fufexan avatar Jun 13 '24 14:06 fufexan

I decided to run some basic tests on this. The following numbers were taken after a clean reboot, spawning some windows, and resizing them a bit to simulate usage.

Test system is:

Kernel: 6.9.4
Distro: Gentoo
Hyprland: 0.41.1
GCC: 13.2.1
Clang: 17.0.6
libc: glibc 2.38
gcc:
    size: 7971152B
    mem:
        153.824MB
        154.352MB
        150.5MB
        147.105MB
        146.73MB
        ---------
        150.502MB

gcc+lto:
    size: 6504656B (-18.4%)
    mem:
        155.539MB
        147.594MB
        145.941MB
        161.391MB
        145.891MB
        ---------
        151.271MB (+0.51%)

clang:
    size: 7489688B
    mem:
        162.188MB
        154.961MB
        154.352MB
        147.148MB
        150.805MB
        ---------
        153.891MB

clang+lto:
    size: 6711216B (-10.4%)
    mem:
        156.711MB
        149.492MB
        154.074MB
        153.367MB
        156.953MB
        ---------
        154.119MB (+0.15%)

So, based on this, the binaries are a decent chunk smaller and there isn't any obvious memory usage regression. But these are just preliminary tests. At the time of measurement, the system was up for only ~1min. I would be interested in seeing what effect it has on CPU usage (if any), but I think that's a bit trickier to measure.

gnusenpai avatar Jun 14 '24 23:06 gnusenpai

With my setup, enabling LTO with GCC, Hyprland crashes on startup, but if I compile with Clang+LTO, it's fine.

JohnRTitor avatar Sep 30 '24 09:09 JohnRTitor