bitmapflow icon indicating copy to clipboard operation
bitmapflow copied to clipboard

Enable link-time optimization when building Rust library for production

Open Calinou opened this issue 3 years ago • 8 comments

This decreases the compiled library size and improves performance.

codegen-units is also set to 1 to further optimize the compiled library.

There's more advice here: https://likebike.com/posts/How_To_Write_Fast_Rust_Code.html

Note: I haven't tried to run the Godot project as a whole with these changes. It's likely that it will work as-is, but I recommend cloning this branch locally to test before merging.

Preview

Linux x86_64 (rustc nightly from 2021-03-26). Both libraries had their debug symbols removed by running strip on them.

Before (LTO disabled, codegen-units = 16)

2,672,984 target/release/libbitmapflow_rust.so

After (LTO enabled, codegen-units = 16)

2,132,240 target/release/libbitmapflow_rust.so

After (LTO enabled, codegen-units = 1) – this PR

2,066,704 target/release/libbitmapflow_rust.so

Calinou avatar Mar 27 '21 01:03 Calinou

Cool, thanks. Before I merge this I have a question. Have you noticed any regressions in compile times? A little increase is fine, but I'd rather not have compile times that increase significantly...

Also, do these flags work on Windows as well?

Bauxitedev avatar Mar 27 '21 11:03 Bauxitedev

Have you noticed any regressions in compile times?

I haven't done thorough benchmarking, but I'd expect a release build to take 30 seconds to 1 minute longer. This doesn't impact debug builds where you do most of the development anyway.

Also, do these flags work on Windows as well?

They should, but I haven't tested them there.

Calinou avatar Mar 27 '21 13:03 Calinou

Hmm just tested it, compile times increase significantly, going from ~15 sec to ~1 minute on my 4th gen i5 CPU. I'd love to use this for the final binary, but having this enabled during development really hinders my development speed. Do you think there is any way to somehow only enable this flag when publishing the final binary? E.g. I have a publish_windows.py script in the repo that exports an EXE file with all required DLLs into a zip file, do you think it would be somehow possible to compile bitmapflow-rust using these flags only when you run the script?

Bauxitedev avatar Mar 28 '21 09:03 Bauxitedev

Hmm just tested it, compile times increase significantly, going from ~15 sec to ~1 minute on my 4th gen i5 CPU. I'd love to use this for the final binary, but having this enabled during development really hinders my development speed. Do you think there is any way to somehow only enable this flag when publishing the final binary?

Are you sure this affects build times when using cargo build instead of cargo build --release? The default is to build in debug mode.

It's possible to set build flags in the build script, but I'd prefer if this was always done for release builds. In the build script, set the CARGO_PROFILE_RELEASE_LTO environment variable to fat and RUSTFLAGS to -C codegen-units=1 then build with cargo build --release.

Calinou avatar Mar 28 '21 12:03 Calinou

I always develop in release mode, because 1. it gives me a better picture of the real world performance of my program during development (profiling should always be done in release mode), and 2. I don't have to worry about accidentally copying the wrong DLL and shipping debug builds. If I need to debug something I can use the trusty godot_dbg!() macro :smile:

I think I'll go for the environment variable for now, to avoid slowing down development.

Update: just tested running in debug mode, it doesn't even work on my end :smile: Getting a bazillion linker errors everywhere. Not sure if that's a configuration problem, I'll have to look into it.

Bauxitedev avatar Mar 29 '21 09:03 Bauxitedev

@Bauxitedev Could you test the publish_windows.py script included in this PR? I also modified the Docker file, but it appears to only build for Linux.

Calinou avatar Mar 29 '21 12:03 Calinou

I'll check it out soon. You can ignore the dockerfile for now, I used it for testing purposes only, it doesn't serve any purpose anymore.

Bauxitedev avatar Mar 29 '21 16:03 Bauxitedev

Just tested, the python build script doesn't seem to work for me. The cargo build step fails with:

   Compiling autocfg v1.0.1
   Compiling libc v0.2.88
   Compiling proc-macro2 v1.0.24
   Compiling unicode-xid v0.2.1
error: couldn't create a temp dir: Permission denied. (os error 5) at path "C:\\Windows\\rustcBRVLvu"

error: aborting due to previous error

error: could not compile `unicode-xid`

It seems to be trying to create a temp directory in the Windows folder, even though it doesn't have permission to do so (the script shouldn't run with admin privileges). Any ideas?

Also it would be nice if the script stopped entirely if this step fails (easy fix, just add check=True to the argument list of subprocess.run)

Bauxitedev avatar Mar 30 '21 09:03 Bauxitedev