cc-rs icon indicating copy to clipboard operation
cc-rs copied to clipboard

Arguably improper -Clto -> -flto flag mapping

Open dot-asm opened this issue 8 months ago • 14 comments

cc-rs appears to parse RUSTFLAGS and map -Clto rustc codegen option to clang's -flto=full. It should be noted that on the rustc side the -Clto is required to be complemented with -Cembed-bitcode. This [incidentally] means that rustc's preferred object format is one with embedded bitcode. But on the clang side -flto makes the compiler emit raw bitcode, a non-object file. One can argue that it would be more appropriate to match the rustc's preferred format. But before one rushes to map -Cembed-bitcode to -fembed-bitcode, one should recognize that -flto overrides -fembed-bitcode. In the sense that clang -flto -fembed-bitcode will still generate raw bitcode. So that if one aims to match the formats, one would have to map -Cembed-bitcode alone and let -Clto go unmapped. Alternatively one can wonder if -Cembed-bitcode will be implied with -Clto in the future and become optional. If there is a possibility, then one can make a case for mapping -Clto to -fembed-bitcode instead of -flto.

dot-asm avatar Apr 28 '25 16:04 dot-asm

Hmmm... Though -fembed-bitcode appears to be incompatible with -ffunction-sections and -fdata-sections, which are always added by the cc-rs. -flto on the other hand simply overrides them rendering them meaningless. So that if one aims to match rustc's preferred object format, one needs to make -ffunction-sections and -fdata-sections conditional.

dot-asm avatar Apr 28 '25 16:04 dot-asm

But before one rushes to map -Cembed-bitcode to -fembed-bitcode,

This gives an impression that the mapping is not implemented. It's wrong impression. One gets this impression because of

-fembed-bitcode appears to be incompatible with -ffunction-sections and -fdata-sections.

The mapping is apparently attempted with the flags in question.

dot-asm avatar Apr 29 '25 09:04 dot-asm

If anything it is -Clinker-plugin-lto that should be mapped to -flto, because it's the one that makes rustc generate the raw bitcode.

dot-asm avatar Apr 29 '25 16:04 dot-asm

Hitting my head with this one. I expected the cc crate to automatically detect my profile.release.lto value and perform cross language LTO with my rust binary, but it didn't

Setting RUSTFLAGS to something like -C lto -C link-arg=-flto manually works, but requires some setup I world rather avoid. Plus, I believe there is something else I might be lacking, as I only managed to shave 100KB from my binary

Altair-Bueno avatar May 05 '25 13:05 Altair-Bueno

@Altair-Bueno you need to enable linker lto plugin, otherwise rustc would perform LTO on rust and C code separately, and then link them together

NobodyXu avatar May 05 '25 14:05 NobodyXu

rustc would perform LTO on rust and C code separately, and then link them together

There is no such thing as performing LTO separately. Linker is called once and it doesn't care which compiler the bitcode inputs come from. The bitcode modules naturally has to meet requirements set by the said linker. Well, there is a caveat. On Unices Rust uses indirect calls on the FFI boundary and LTO doesn't seem to be able [to] look through it. This can create an illusion that Rust and C bitcodes are optimized separately, but formally speaking they aren't.

dot-asm avatar May 06 '25 10:05 dot-asm

Setting RUSTFLAGS ... manually works, but requires some setup I world rather avoid.

Unfortunately it's the only option. cargo does not convey information it collects from its .toml files to the build script.

dot-asm avatar May 06 '25 10:05 dot-asm

Thanks for the response @NobodyXu. Yes, I'm aware the -C linker-plugin-lto is needed. However, this argument is automatically added by Cargo when profiles.<PROFILE>.lto is set to true (see the relevant Cargo.toml options set). I can verify they are added to the rustc invocation when I run a build with verbose output (cargo build --release -vv).

I can clearly see the following relevant options being applied to my sys crate:

  • -l static=<LIB> with each C dependency
  • --crate-type lib
  • --emit=dep-info,metadata,link
  • -C opt-level=z
  • -C panic=abort
  • -C linker-plugin-lto
  • -C codegen-units=1
  • -C strip=symbols
  • Other RUSTFLAGS values (see below)

And I can see these options being applied to my C libraries (through CMake using the cmake crate):

  • -DCMAKE_TOOLCHAIN_FILE
  • -DCMAKE_C_FLAGS containing CFLAGS
  • -DCMAKE_CXX_FLAGS containing CXXFLAGS
  • -DCMAKE_ASM_FLAGS containing CFLAGS

So, as @dot-asm noted, I have to manually append -C lto to my RUSTFLAGS so that cc recognizes that LTO is enabled. -C lto is only appended to the last crate being built (in this case, a binary).

CARGO_TARGET_ARMV7_UNKNOWN_LINUX_GNUEABIHF_RUSTFLAGS+=' -C lto'
cargo build --release -vv

With this, my sys crate is finally built using -C lto. cc successfully recognizes this and the cmake crate also appends -flto=full to CMAKE_C_FLAGS, CMAKE_CXX_FLAGS and CMAKE_ASM_FLAGS contains -flto=full

As a bonus, running llvm-dis-19 without setting -C lto over my *.c.oobjects does fail.

However, I feel like I'm missing something because the gains were miserable...


Enviroment

Rustc & LLVM version

rustc 1.86.0 (05f9846f8 2025-03-31)
binary: rustc
commit-hash: 05f9846f893b09a1be1fc8560e33fc3c815cfecb
commit-date: 2025-03-31
host: x86_64-unknown-linux-gnu
release: 1.86.0
LLVM version: 19.1.7

Relevant enviroment variables

PKG_CONFIG_PATH=<REDACTED>
CARGO_TARGET_ARMV7_UNKNOWN_LINUX_GNUEABIHF_RUSTFLAGS=-C link-arg=-fuse-ld=lld-19 -C link-arg=--sysroot=<REDACTED> -C target-cpu=cortex-a7
SDKTARGETSYSROOT=<REDACTED>
CPP=clang-cpp-19
CLANG_BUILD_ARGS=--target=armv7-unknown-linux-gnueabihf -mcpu=cortex-a7  -mfloat-abi=hard --sysroot=<REDACTED>
PIP_BREAK_SYSTEM_PACKAGES=1
CXX=clang++-19
CXXFLAGS=--target=armv7-unknown-linux-gnueabihf -mcpu=cortex-a7  -mfloat-abi=hard --sysroot=<REDACTED>
CARGO_TARGET_ARMV7_UNKNOWN_LINUX_GNUEABIHF_LINKER=clang-19
CARGO_BUILD_TARGET=armv7-unknown-linux-gnueabihf
LDFLAGS=-fuse-ld=lld-19 --sysroot=<REDACTED>
HOME=/root
CARGO_HOME=/root/cargo
LD=clang-19
RUSTUP_HOME=/root/rustup
AR=llvm-ar-19
HOST_CXXFLAGS=
HOST_CCFLAGS=
PKG_CONFIG_SYSROOT_DIR=<REDACTED>
OBJCOPY=llvm-objcopy-19
BINDGEN_EXTRA_CLANG_ARGS=--target=armv7-unknown-linux-gnueabihf -mcpu=cortex-a7  -mfloat-abi=hard --sysroot=<REDACTED>
STRIP=llvm-strip-19
OBJDUMP=llvm-objdump-19
CC=clang-19
CFLAGS=--target=armv7-unknown-linux-gnueabihf -mcpu=cortex-a7  -mfloat-abi=hard --sysroot=<REDACTED>
RANLIB=llvm-ranlib-19

Relevant Cargo.toml snippet

[profile.release]
panic = "abort"
lto = true
codegen-units = 1
opt-level = 3

Altair-Bueno avatar May 06 '25 10:05 Altair-Bueno

Hmmm... As for -fembed-bitcode. Consider a.c

int foo();
int main() { return foo(); }

and b.c

int foo() { return 42; }

clang -fembed-bitcode -c -O2 a.c b.c yields object files with embedded bit code. If I clang -flto a.o b.o the main simply returns 42 without making any calls. LTO in action. This is with default-on-my-computer clang 14. But if I take these to clang 19, then the main makes a [tail] call to foo. In other words clang 19 fails to utilize the embedded bitcode. I've checked, it is there... So it's either a recent bug or -fembed-bitcode is deprecated?

dot-asm avatar May 06 '25 12:05 dot-asm

I spent a little bit of time creating a small reproduction of this issue using only two crates. I included two bash scripts that set the environment variables.

It just uses clang, llvm, cargo and rust with no cross compilation fuzz.

cc-does-not-add-flto.zip

Altair-Bueno avatar May 06 '25 12:05 Altair-Bueno

cc-does-not-add-flto

As already mentioned, cargo does not convey the information you expect, and there is nothing cc-rs can do about it. In other words it's not a cc-rs problem.

dot-asm avatar May 06 '25 13:05 dot-asm

So, originally, you were saying that the ideal way of performing cross language LTO would be to:

  • CFLAGS/LDFLAGS
    • Always append -fembed-bitcode so that objects contain bytecode
  • RUSTFLAGS
    • -C embed-bitcode is not needed because Rust already embeds bytecode when -C linker-plugin-lto
    • Append -C linker-plugin-lto if you have profiles.<profile>.lto disabled or unset

So that way both rustc and clang both emit raw bitcode (prefered by clang?).

However, you latter specified that

If anything it is -Clinker-plugin-lto that should be mapped to -flto, because it's the one that makes rustc generate the raw bitcode.

This matches the commands used on the official Rustc book, which uses -flto + -C linker-plugin-lto. And I suppose it would embed bitcode within the object files (prefered by rustc)

Moreover, I wonder if cargo would add a way for build scripts to see if LTO is enabled (aka -C linker-plugin-lto) so that cc-rs could invoke clang with -flto

PS: I'm still unsure on what -C lto actually means, and if it conflicts with -C linker-plugin-lto. The docs aren't clear about that.

Altair-Bueno avatar May 09 '25 14:05 Altair-Bueno

So, originally, you were saying that the ideal way of performing cross language LTO would be to

I'm merely pointing out inconsistencies in rustc -> clang automatic flag mappings. As in a) rustc -Clto generates embedded bitcode; vs. b) cc-rs maps the flag to clang -flto, which generates raw bitcode. If it's intentional, which might be the case, then cc-rs people are free to weigh in and close the issue. And if not, they're free to address it.

Now, your problem is kind of orthogonal to the thesis. You expect cc-rs to pick lto value from [profile] in Cargo.toml. What I'm positing is that cc-rs doesn't pick any such value, be it lto or anything else. If you want to propose a change, then this is arguably a wrong forum, you should talk to cargo people, not cc-rs. [I don't speak for cc-rs people though. What do I know, maybe they would be eager to relay and track the issue with cargo people...]

dot-asm avatar May 09 '25 22:05 dot-asm

rustc would perform LTO on rust and C code separately, and then link them together

There is no such thing as performing LTO separately. Linker is called once and it doesn't care which compiler the bitcode inputs come from.

Or maybe there is a thing that looks like separate LTO... Indeed, if I rustc --crate-type=lib -O snippet.rs, with snippet.rs being

pub fn add(a: i32, b: i32) -> i32 { a+b }

the .text segments are always empty. "Always" means "regardless of lto and embed-bitcode flags." This means that rustc has to generate the code as it takes in the libsnippet.rlib. The said code generation pass can be viewed as separate linking, because it puts together multiple Rust modules to pass further down to the system linker.

dot-asm avatar May 10 '25 12:05 dot-asm

To summarize. A case has been made for ineffectualness of the current Rust -Clto to clang -flto option mappings. The only arguably meaningful/working mapping is -Clinker-plugin-lto to -flto, which is not implemented. Fix or close with "won't fix" resolution.

dot-asm avatar Sep 21 '25 17:09 dot-asm

Won't -flto able to optimize within the C/C++ library, even if it cannot be optimized across language boundaries?

Alternatively it'd be great if it's possible to perform lto on all objects and still produce an archive, we can run that just before archiving, and skip it if linker plugin is enabled."

NobodyXu avatar Sep 22 '25 09:09 NobodyXu

Won't -flto able to optimize within the C/C++ library, even if it cannot be optimized across language boundaries?

I don't follow. If you question effectiveness of -flto, then why bother with corresponding flag mapping at all? But yes, it will/does. Moreover, I've just checked with the latest and LTO managed to go across language boundaries. Just in case with manually set -Clinker-plugin-lto and -flto.

Alternatively it'd be great if it's possible to perform lto on all objects and still produce an archive, we can run that just before archiving, and skip it if linker plugin is enabled."

Is it "won't fix"?

dot-asm avatar Sep 22 '25 11:09 dot-asm

Is it "won't fix"?

I do want to fix this, just want to check to see if we can keep the benefit of LTO inside the C/C++ library.

NobodyXu avatar Sep 22 '25 11:09 NobodyXu

Did some research, for clang it seems to support this

llvm-link file1.bc file2.bc file3.bc -o merged.bc
llc merged.bc -filetype=obj -o merged.o

I didn't find anyway to achieve that for gcc.

NobodyXu avatar Sep 22 '25 11:09 NobodyXu

So we could just fix this, having -flto on -Clinker-plugin-LTO

And then we could add back c/c++ internal LTO for clang/zig-cc

NobodyXu avatar Sep 22 '25 11:09 NobodyXu

Did some research, for clang it seems to support this

llvm-link file1.bc file2.bc file3.bc -o merged.bc llc merged.bc -filetype=obj -o merged.o

I suppose the expectation is that one of the steps would perform LTO. Can you actually confirm that it does? I can't. Taking a.c and b.c from above the binary code generated for { return foo(); } is a tail call to foo, while LTO should have rendered it as returning the constant. LTO didn't happen.

So we could just fix this, having -flto on -Clinker-plugin-LTO

A reminder. Current mappings are internally inconsistent and arguably even harmful. Indeed, once you trigger -Clto to -flto mapping, by default the link stage is bound to fail, because clang will emit raw bitcode, while linker won't be instructed to handle it. In other words one can make a case that one should omit current -Clto mappings.

dot-asm avatar Sep 22 '25 12:09 dot-asm

Yeah it makes sense, it seems the best now to

  • ignore -Clto
  • for -Clinker-plugin-LTO, uses -flto

Better than what we have now, with -Clto potentially breaking the build.

It's unfortunate that we cannot use LTO within the C/C++ like rustc does (really curious how rustc achieves) but I will settle with it.

I would accept a PR, or if you want me to make a PR could do it sometimes in the rest of the week.

NobodyXu avatar Sep 22 '25 14:09 NobodyXu

Just as a side note, if you want to create LTO objects that contain both bitcode and ELF, the option is -ffat-lto-objects, not -fembed-bitcode.

nikic avatar Sep 23 '25 08:09 nikic

Just as a side note, if you want to create LTO objects that contain both bitcode and ELF, the option is -ffat-lto-objects, not -fembed-bitcode.

It doesn't make https://github.com/rust-lang/cc-rs/issues/1463#issuecomment-2854416498 work with more recent clang. Quite the contrary. If I compile with -fembed-code, the object files are at least crafted with .llvmbc sections, while if I compile with -ffat-lto-objects, there are no .llvmbc sections. The flag doesn't seem to have any effect on .o files, just compile with and without and compare sizes and checksums.

dot-asm avatar Sep 23 '25 09:09 dot-asm

-ffat-lto-objects produces an .llvm.lto section, not a .llvmbc section (which is for defunct Apple embedded bitcode). And of course you need to combine it with a -flto option, it's not going to do anything by itself.

(By the way, I have no idea what this issue is actually trying to report. I've tried and failed to understand it multiple times. It directly goes into details about compilation options without ever explaining what the problem this is attempting to solve.)

nikic avatar Sep 23 '25 09:09 nikic

And of course you need to combine it with a -flto option, it's not going to do anything by itself.

Ah! -flto alone emits raw bitcode, while in combination with -ffat-lto-objects, it produces ELF objects with both machine code and LLVM bitcode. And it does make https://github.com/rust-lang/cc-rs/issues/1463#issuecomment-2854416498 [work] with newer clang. Mystery solved 👍

All right, with this new information in mind it would be possible to keep -Clto mappings, but complement it with -ffat-lto-objects. However! There is a caveat. The option is not recognized by clang versions prior 18.

dot-asm avatar Sep 23 '25 09:09 dot-asm

it would be possible to keep -Clto mappings

Just in case, it's still more than appropriate to add -Clinker-plugin-lto mapping.

dot-asm avatar Sep 23 '25 09:09 dot-asm

it would be possible to keep -Clto mappings

Just in case, it's still more than appropriate to add -Clinker-plugin-lto mapping.

And this mapping should not add -ffat-lto-objects, because with it LTO fails to pierce the language boundaries.

dot-asm avatar Sep 23 '25 10:09 dot-asm

To re-summarize.

  • -Clinker-plugin-lto should be mapped to -flto [without -ffat-lto-objects], it appears to be the only way to ~make~ have LTO performed across language boundaries
  • -Clto mapping can be salvaged by adding -ffat-lto-objects, but it's ignored by clang prior 18 (with a warning), which means that by default compilations with these versions are bound to fail, because the system linker won't be instructed to handle raw bitcode

I for one would still advocate for omitting -Clto mapping even with recent clang versions, because it doesn't translate to instructions to the system linker, which effectively means that the bitcode embedded by clang would be just dead weight.

dot-asm avatar Sep 23 '25 10:09 dot-asm

I for one would still advocate for omitting -Clto mapping even with recent clang versions, because it doesn't translate to instructions to the system linker, which effectively means that the bitcode embedded by clang would be just dead weight.

Right. In order for -flto on object compilations to be useful, the linker invocation also needs -flto. I guess cc-rs could emit something like cargo:rustc-link-arg=-flto, but that probably goes against the spirit of -C lto (as opposed to -C linker-plugin-lto).

nikic avatar Sep 23 '25 10:09 nikic