bitmapflow
bitmapflow copied to clipboard
Enable link-time optimization when building Rust library for production
This decreases the compiled library size and improves performance.
codegen-units
is also set to 1 to further optimize the compiled library.
There's more advice here: https://likebike.com/posts/How_To_Write_Fast_Rust_Code.html
Note: I haven't tried to run the Godot project as a whole with these changes. It's likely that it will work as-is, but I recommend cloning this branch locally to test before merging.
Preview
Linux x86_64 (rustc nightly from 2021-03-26). Both libraries had their debug symbols removed by running strip
on them.
Before (LTO disabled, codegen-units = 16
)
2,672,984 target/release/libbitmapflow_rust.so
After (LTO enabled, codegen-units = 16
)
2,132,240 target/release/libbitmapflow_rust.so
After (LTO enabled, codegen-units = 1
) – this PR
2,066,704 target/release/libbitmapflow_rust.so
Cool, thanks. Before I merge this I have a question. Have you noticed any regressions in compile times? A little increase is fine, but I'd rather not have compile times that increase significantly...
Also, do these flags work on Windows as well?
Have you noticed any regressions in compile times?
I haven't done thorough benchmarking, but I'd expect a release build to take 30 seconds to 1 minute longer. This doesn't impact debug builds where you do most of the development anyway.
Also, do these flags work on Windows as well?
They should, but I haven't tested them there.
Hmm just tested it, compile times increase significantly, going from ~15 sec to ~1 minute on my 4th gen i5 CPU. I'd love to use this for the final binary, but having this enabled during development really hinders my development speed. Do you think there is any way to somehow only enable this flag when publishing the final binary? E.g. I have a publish_windows.py
script in the repo that exports an EXE file with all required DLLs into a zip file, do you think it would be somehow possible to compile bitmapflow-rust using these flags only when you run the script?
Hmm just tested it, compile times increase significantly, going from ~15 sec to ~1 minute on my 4th gen i5 CPU. I'd love to use this for the final binary, but having this enabled during development really hinders my development speed. Do you think there is any way to somehow only enable this flag when publishing the final binary?
Are you sure this affects build times when using cargo build
instead of cargo build --release
? The default is to build in debug mode.
It's possible to set build flags in the build script, but I'd prefer if this was always done for release builds.
In the build script, set the CARGO_PROFILE_RELEASE_LTO
environment variable to fat
and RUSTFLAGS
to -C codegen-units=1
then build with cargo build --release
.
I always develop in release mode, because 1. it gives me a better picture of the real world performance of my program during development (profiling should always be done in release mode), and 2. I don't have to worry about accidentally copying the wrong DLL and shipping debug builds. If I need to debug something I can use the trusty godot_dbg!()
macro :smile:
I think I'll go for the environment variable for now, to avoid slowing down development.
Update: just tested running in debug mode, it doesn't even work on my end :smile: Getting a bazillion linker errors everywhere. Not sure if that's a configuration problem, I'll have to look into it.
@Bauxitedev Could you test the publish_windows.py
script included in this PR? I also modified the Docker file, but it appears to only build for Linux.
I'll check it out soon. You can ignore the dockerfile for now, I used it for testing purposes only, it doesn't serve any purpose anymore.
Just tested, the python build script doesn't seem to work for me. The cargo build
step fails with:
Compiling autocfg v1.0.1
Compiling libc v0.2.88
Compiling proc-macro2 v1.0.24
Compiling unicode-xid v0.2.1
error: couldn't create a temp dir: Permission denied. (os error 5) at path "C:\\Windows\\rustcBRVLvu"
error: aborting due to previous error
error: could not compile `unicode-xid`
It seems to be trying to create a temp directory in the Windows folder, even though it doesn't have permission to do so (the script shouldn't run with admin privileges). Any ideas?
Also it would be nice if the script stopped entirely if this step fails (easy fix, just add check=True
to the argument list of subprocess.run
)