Consider using Link-Time Optimization (LTO) for the project

Open zamazan4ik opened this issue 9 months ago • 1 comments

Hi!

I noticed that for some reason, Link-Time Optimization (LTO) is not enabled in the project. I tried to find any discussion about it in the repo (like other issues) but didn't find anything about that question.

Is there any specific reason for not using LTO for the project? Since Rerun is a huge one, with LTO we can significantly improve binary sizes and allow the compiler to perform more aggressive cross-crate optimizations. If FatLTO is too expensive to run even for releases we can consider "cheaper" from build resources perspective option - ThinLTO (though it's less efficient than FatLTO from performed optimizations perspective). Other options like codegen-units = 1 also will help with more aggressive optimizations.

I think you can enable LTO only for the Release builds so as not to sacrifice the developers' experience while working on the project, since LTO consumes an additional amount of time to finish the compilation routine. In this case, we can create a dedicated [profile.optimized-dev] profile where LTO will be disabled (so developers experience will not be affected).

I performed some tests with the build command from the Pixi file for building binaries in the Release mode. Test env: Fedora 41, AMD Ryzen 9 5900x, specified in the project settings Rustc version.

Binary size results:

Rerun CLI:

Release: 105 Mib
Release + codegen-units = 1 + ThinLTO: 95 Mib
Release + codegen-units = 1 + FatLTO: 85 Mib

Rerun WASM:

Release: 39 Mib
Release + codegen-units = 1 + ThinLTO: 33 Mib
Release + codegen-units = 1 + FatLTO: 33 Mib

The same level of improvements is expected for other project's Rust-based binaries like the Python SDK too.

Clean build times results:

Rerun CLI:

Release: 1m 52s
Release + codegen-units = 1 + ThinLTO: 3m 01s
Release + codegen-units = 1 + FatLTO: 6m 34s

Rerun WASM:

Release: 65s
Release + codegen-units = 1 + ThinLTO: 97s
Release + codegen-units = 1 + FatLTO: 97s

Thank you.

Mar 21 '25 04:03 zamazan4ik

Thanks for compiling these numbers!

Creating a new opt-in profile ([profile.optimized-release-build] or similar) would be best imho, and then enable it only for our release jobs and nightly benchmarking. That way our users at least get the fastest (and smallest) possible binary when using pip install and cargo binstall.

In particular, I do not want to slow down cargo install rerun-cli

Mar 21 '25 09:03 emilk