robrix icon indicating copy to clipboard operation
robrix copied to clipboard

Enable Link-Time Optimization (LTO) and codegen-units = 1

Open zamazan4ik opened this issue 7 months ago • 2 comments

Hi!

I noticed that in the Cargo.toml file Link-Time Optimization (LTO) for the project is not enabled. I suggest switching it on since it will reduce the binary size (always a good thing to have) and will likely improve the application's performance a bit. If you want to read more about LTO and its possible modes, I recommend starting from this Rustc documentation.

I think you can enable LTO only for the Release builds so as not to sacrifice the developers' experience while working on the project, since LTO consumes an additional amount of time to finish the compilation routine. In this case, we can create a dedicated [profile.optimized-dev] profile where LTO will be disabled (so developers experience will not be affected). If we enable it on the Cargo profile level for the Release profile, users, who install the application with cargo install, will get the LTO-optimized version of the app "automatically". E.g., check cargo-outdated Release profile. You also could be interested in other optimization options like codegen-units = 1 - it also brings improvements over the current defaults.

Basically, it can be enabled with the following lines to the root Cargo.toml file:

[profile.release]
codegen-units = 1
lto = true

For the tests, I disabled debug = full from the Release profile since it blows up the binary size to the moon (almost a 1 Gib binary, huh). Before enabling LTO I highly recommend disabling this too ;)

I have made quick tests (AMD Ryzen 9 5900x, Fedora 42, Rust 1.86, the latest version of the project at the moment, cargo build --release command) - here are the results:

  • Release: 94 Mib, clean build time: 2m 05s
  • Release + codegen-units = 1 + Fat LTO: 62 Mib, clean build time: 7m 19s

I think we can enable such optimizations so the prebuilt binaries will be optimized as much as possible for end users.

Thank you.

zamazan4ik avatar May 14 '25 13:05 zamazan4ik

Agreed, that sounds like a good idea! At the very least I should remove debug info from a release build, that's a great point.

kevinaboos avatar May 14 '25 13:05 kevinaboos

Coincidentally I just found an /r/rust post with some other info related to improving compilation times during development and making builds smaller: https://www.reddit.com/r/rust/comments/1kr7ri4/psa_you_can_disable_debuginfo_to_improve_rust/

I'll also try some of these out to see if it noticeably improves things.

kevinaboos avatar May 20 '25 21:05 kevinaboos

I experimented with this today and am interested in adding the following profiles:

## An optimized profile for development, with full debug info.
[profile.debug-release]
inherits = "dev"
opt-level = 3
#lto = "thin"  ## optional, could add back in

## Enable full optimizations for the release profile,
## because it is used to build distributed app bundles.
[profile.release]
codegen-units = 1
lto = true  ## same as "fat"

The only hesitation I have here is the increase in compile time due to LTO and reduced compilation parallelization via a smaller codegen-units value. Typically, everyone uses --release for all of their builds will be very surprised when the build time suddenly get 5x worse.

Here are some figures for clean builds on my M1 Pro macbook pro (admittedly a bit old, but still):

  • default debug ("dev") build: 2.5 mins
  • default release build (same as debug-release without LTO): 3.8 mins
  • debug-release profile with thin LTO: 5.6 mins
  • release build with fat LTO (and default codegen-units = 16): 13.2 - 13.9 minutes
  • release build with fat LTO and codegen-units = 8: 11.2 - 12.5 mins
  • release build with fat LTO and codegen-units = 1: 10.5 - 11 mins

I'm surprised that with fat LTO, fewer codegen units results in a faster build time; that's the opposite of what I expected.

Ideally, we would be able to add a new "distribution" profile specifically for building app bundles that get distributed/published to app stores, but unfortunately the build tooling that we use (cargo-packager) does not support custom cargo profiles (beyond just the default "dev" and "release"). Something like this would be nice, though:

[profile.distribution]
inherits = "release"
codegen-units = 1
lto = true

If I can add support for arbitrary cargo profiles to cargo-packager, then this is the approach I will take. Until then, however, I think it's best to keep LTO disabled on default release builds because it is just far too costly. I will still enable it on my local .cargo/config.toml when I build the distributed app bundles (and on CI).

@zamazan4ik what do you think, any thoughts on this? I'm curious how significant of an impact codegen-units typically has on the compiled code. LTO also seems expensive but is certainly worthwhile for a distributable build.

kevinaboos avatar Jun 25 '25 01:06 kevinaboos

Excuse me for the so late response!

I'm surprised that with fat LTO, fewer codegen units results in a faster build time; that's the opposite of what I expected.

Yeah, I have the same observations - in all (or almost all) projects, the build process was finished quicker with FatLTO + CG1 than with just FatLTO. I guess this is due to more aggressive optimizations with CG1 that left less things to do to the LTO phase. But this is just a guess from a non-compiler guy :)

If I can add support for arbitrary cargo profiles to cargo-packager, then this is the approach I will take.

In this case, can we create an issue for supporting custom profiles in the cargo-packager's repo? I think this feature would be useful not only for Robrix. And maybe one day it will be implemented :)

Until then, however, I think it's best to keep LTO disabled on default release builds because it is just far too costly. I will still enable it on my local .cargo/config.toml when I build the distributed app bundles (and on CI).

Sure, that's a great way to enable it!

@zamazan4ik what do you think, any thoughts on this? I'm curious how significant of an impact codegen-units typically has on the compiled code. LTO also seems expensive but is certainly worthwhile for a distributable build.

I fully agree with your final choice! Regarding codegen-units impact - I have no huge statistics regarding the actual results, but have some observations. In all cases that I've tested, using less codegen units lead to smaller binaries. Performance also should be improved but I don't have such benchmarks near me.

zamazan4ik avatar Jun 28 '25 02:06 zamazan4ik

No problem at all, thanks for the insightful reply! When time permits, I'll dig into cargo packager more. It's possible that it already supports it and I just don't know how to properly use it.

kevinaboos avatar Jun 28 '25 05:06 kevinaboos