Consider using Link-Time Optimization (LTO) for the project
Hi!
I noticed that for some reason, Link-Time Optimization (LTO) is not enabled in the project. I tried to find any discussion about it in the repo (like other issues) but didn't find anything about that question.
Is there any specific reason for not using LTO for the project? Since Rerun is a huge one, with LTO we can significantly improve binary sizes and allow the compiler to perform more aggressive cross-crate optimizations. If FatLTO is too expensive to run even for releases we can consider "cheaper" from build resources perspective option - ThinLTO (though it's less efficient than FatLTO from performed optimizations perspective). Other options like codegen-units = 1 also will help with more aggressive optimizations.
I think you can enable LTO only for the Release builds so as not to sacrifice the developers' experience while working on the project, since LTO consumes an additional amount of time to finish the compilation routine. In this case, we can create a dedicated [profile.optimized-dev] profile where LTO will be disabled (so developers experience will not be affected).
I performed some tests with the build command from the Pixi file for building binaries in the Release mode. Test env: Fedora 41, AMD Ryzen 9 5900x, specified in the project settings Rustc version.
Binary size results:
Rerun CLI:
- Release: 105 Mib
- Release +
codegen-units = 1+ ThinLTO: 95 Mib - Release +
codegen-units = 1+ FatLTO: 85 Mib
Rerun WASM:
- Release: 39 Mib
- Release +
codegen-units = 1+ ThinLTO: 33 Mib - Release +
codegen-units = 1+ FatLTO: 33 Mib
The same level of improvements is expected for other project's Rust-based binaries like the Python SDK too.
Clean build times results:
Rerun CLI:
- Release: 1m 52s
- Release +
codegen-units = 1+ ThinLTO: 3m 01s - Release +
codegen-units = 1+ FatLTO: 6m 34s
Rerun WASM:
- Release: 65s
- Release +
codegen-units = 1+ ThinLTO: 97s - Release +
codegen-units = 1+ FatLTO: 97s
Thank you.
Thanks for compiling these numbers!
Creating a new opt-in profile ([profile.optimized-release-build] or similar) would be best imho, and then enable it only for our release jobs and nightly benchmarking. That way our users at least get the fastest (and smallest) possible binary when using pip install and cargo binstall.
In particular, I do not want to slow down cargo install rerun-cli