quickwit Consider enabling more aggressive optimizations for the Release profile: Fat LTO and codegen-units = 1

Hi!

I see that the project already uses ThinLTO in the Release profile in the root Cargo.toml file - it was introduced in this commit. However, ThinLTO is usually less efficient from the perspective of performed optimizations than Fat (aka Full) LTO. Additionally, I suggest enabling codegen-units = 1 (CG1) too. Enabling more advanced optimizations allows us to reduce the binary size further (always a good thing) and improve the application performance more.

Basically, it can be enabled with the following change:

[profile.release]
codegen-units = 1
lto = "fat" # instead of "thin"

I have made quick local tests (AMD Ryzen 9 5900x, Fedora 42, Rust 1.87, the latest version of this project at the moment, CC=clang CXX=clang++ cargo build --release -p quickwit-cli --features release-feature-set --bin quickwit command) - the results are below.

ThinLTO (current Release profile): 188 Mib, clean build time: 3m 30s
ThinLTO + CG1: 151 Mib, clean build time: 5m 40s
FatLTO: 154 Mib, clean build time: 11m 16s
FatLTO + CG1: 139 Mib, clean build time: 10m 18s

Since the Release profile is used only for release binaries, this build time increase shouldn't be a problem for the project. I think you can afford build time increase on CI if it provides more optimized Quickwit for users. Top memory consumption for FatLTO during the build was around 12 Gib. It's a huge number but still acceptable for build farms, IMHO.

I didn't perform performance measurements (I'm not sure how to do it properly since I have much less domain knowledge than Quickwit devs) but I expect FatLTO + CG1 is also the most performant Quickwit version.

Thank you.

Jun 23 '25 05:06 zamazan4ik

Thanks for the detailed write up.

Have you tried your configs with the Github runners the project is using? I suspect it would OOM, or be really really slow (current X86 Docker build is 22 minutes, assuming the increase would be proportional to your experiment, it would jump to more than an hour).

Jun 23 '25 10:06 rdettai

Have you tried your configs with the Github runners the project is using?

Oh, fair point - I didn't try it on GitHub runners, I performed only local builds. If you are limited only to the GitHub runners, I guess FatLTO isn't the best option. However, if you believe that the proposed above changes are still useful for Quickwit (even if you directly cannot apply them on CI), I suggest adding an additional Cargo profile with a name smth like [profile.heavy-optimization] where we can enable FatLTO, CG1, etc. In this case, even if we cannot enable it on the CI level, other Quickwit users will still be able to rebuild Quickwit with it in an easier way to enable more optimizations.

Jun 28 '25 08:06 zamazan4ik

It could make sense to use more powerful runners, but it's not really on the agenda right now.

I don't see a huge benefit in adding an extra build profile:

if it is not used by the CI, it is not not tested continuously
very few users build QW themselves, those who do are cargo power users anyway (like you 😄), and they will like to tweak the settings themselves

If you make more experiments with build settings, don't hesitate to share your results here, this issue in itself can serve as a resource for people wanting to squeeze more performance out of the build!

Jul 08 '25 08:07 rdettai

[profile.release]
codegen-units = 1
lto = "fat" # instead of "thin"

These setting don't always make a performance difference, we would need to confirm that first.

I would expect more by doing a build with PGO.

Jul 08 '25 11:07 PSeitz

These setting don't always make a performance difference, we would need to confirm that first.

Totally agree! I don't know how to run a proper Quickwit performance. Does Quickwit have any ready-to-use benchmark infra to quickly run performance bench for the proposed changes? In this case, we will be able to prove wins not only for binary size.

I would expect more by doing a build with PGO.

Sure but these two things can be done separately. I didn't want to mix all the things together in this issue ;)

It could make sense to use more powerful runners, but it's not really on the agenda right now.

Got it, thanks!

very few users build QW themselves, those who do are cargo power users anyway (like you 😄), and they will like to tweak the settings themselves

Haha, power users :D However, it would be interesting to hear from other people, who rebuilds Quickwit by themselves, what Rustc/Cargo flags are tweaked by them.

Jul 08 '25 12:07 zamazan4ik

quickwit quickwit copied to clipboard

Consider enabling more aggressive optimizations for the Release profile: Fat LTO and codegen-units = 1

quickwit
quickwit copied to clipboard