cc-rs icon indicating copy to clipboard operation
cc-rs copied to clipboard

Support c/c++ library internal LTO

Open NobodyXu opened this issue 3 months ago • 30 comments

In #1564 we disabled LTO if linker-plugin-lto is not passed in to fix compilation on certain environment.

We could reclaim back the performance, by doing a c/c++ library internal LTO, similar to rustc's crate internal LTO.

To find these tools, we might want 'clang -print-resource-dir', or combined with env var lookup LLVM_INSTALL_DIR

We should also allow build script to override lto mode.

Did some research, for clang it seems to support this

llvm-link file1.bc file2.bc file3.bc -o merged.bc llc merged.bc -filetype=obj -o merged.o I didn't find anyway to achieve that for gcc.

Ok so the opt command from llvm can actually do the optimization, might worth trying to see if it works.

We could do

llvm-link file1.bc file2.bc file3.bc -o merged.bc
opt -O{level} merged.bc
llc merged.bc -filetype=obj -o merged.o

Originally posted by @NobodyXu in #1463

NobodyXu avatar Sep 24 '25 12:09 NobodyXu

Seems that we can use clang -O3 -c merged.bc to do the optimization and codegen, that'd makes things much easier

NobodyXu avatar Sep 25 '25 09:09 NobodyXu

Seems that we can use clang -O3 merged.bc to do the optimization and codegen, that'd makes things much easier

I don't follow. I thought that the goal was to produce an .a that is IPO-ed, or "thin-lto-ed" in Rust terms. clang -O3 will attempt to link a final application.

dot-asm avatar Sep 25 '25 14:09 dot-asm

Thanks I forgot to put -c in it, I need to find some time to experiment it on my PC and see if it can do the job

NobodyXu avatar Sep 25 '25 15:09 NobodyXu

I would advocate in favour of implementing this as a feature and even make it independent of Rust -> clang flag mappings. Indeed, it would more than natural to give crate maintainers a way to perform localized IPO independent of user setting environment variables. Just like it's more natural to configure Rust lto through crate's Cargo.toml.

dot-asm avatar Sep 25 '25 15:09 dot-asm

So tested this out and we just need llvm-link:

clang -O3 -flto -c a.c b.c
llvm-link a.o b.o -o merged.o
clang -O3 -x ir -c merged.o -o final.o

We could use ar_archive_writer instead of llvm-link but it'd be more complicated, plus we will pull in more dependenciesz

NobodyXu avatar Sep 26 '25 07:09 NobodyXu

And if you throw in -flto -ffat-lto-objects(*) to the last command the result would be IPO-ed object module with embedded bitcode which should allow end user to opt for -Clinker-plugin-lto at their discretion. I mean imagine a crate maintainer who wants to apply IPO to their C-based crate for everybody, but leave the door open for crazies out there.

(*) To be conditional on clang>=18.

dot-asm avatar Sep 26 '25 09:09 dot-asm

$ clang --version
Ubuntu clang version 20.1.2 (Oubuntul)
Target: x86_64-pc-Linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm-20/bin

For llvm-link, we can use an env var LLVM_LINK, or fallback to look it up in clang InstalledDir, which will guarantee to be compatible with the clang we are using.

NobodyXu avatar Oct 04 '25 14:10 NobodyXu

I advocate for parsing -print-search-dirs for this kind of things. And to a significantly wider extent than it currently is, as noted in https://github.com/rust-lang/cc-rs/blob/1ca8b2af7a19faa4d090cc5deba3f7a0cd721174/src/lib.rs#L3320-L3325.

dot-asm avatar Oct 05 '25 16:10 dot-asm

I advocate for parsing -print-search-dirs for this kind of things.

On a related note. Elsewhere in src/lib.rs it mentions that one can't use it with clang-cl. This turned to be false, it is possible to query clang-cl with /clang:-print-search-dirs command line option. In other words one can extend this approach even to Windows.

dot-asm avatar Oct 05 '25 17:10 dot-asm

it is possible to query clang-cl with /clang:-print-search-dirs command line option

Hmmm, one can actually query it with just --print-search-dirs, i.e. with extra dash. And the extra dash works even with "normal" clang. As well as gcc(*). So one can unify on it across all targets.

(*) goes back at least to gcc 4.6 and clang 3.8, so "universality" is safe to assume.

dot-asm avatar Oct 05 '25 18:10 dot-asm

it is possible to query clang-cl with /clang:-print-search-dirs command line option

Hmmm, one can actually query it with just --print-search-dirs, i.e. with extra dash. And the extra dash works even with "normal" clang. As well as gcc(*). So one can unify on it across all targets.

Thanks, that's awesome, will open a PR to fix that first, and then a PR for the lto stuff

NobodyXu avatar Oct 06 '25 01:10 NobodyXu

cc @dot-asm opened #1587, would be great if you can have a look please

NobodyXu avatar Oct 06 '25 05:10 NobodyXu

closed this as completed in #1587

This ought to be a mistake. The referred PR is a side note to this issue, and doesn't advance the proposal even a little bit.

dot-asm avatar Oct 07 '25 08:10 dot-asm

Thanks, I just want to reference that particular comment as motivation for the PR and deliberately avoiding fix ..., but GitHub still thought thar it fixes this issue

NobodyXu avatar Oct 07 '25 08:10 NobodyXu

To make IPO easier, I think cc-rs should try to use clang as default and only fallback to gcc/cc if clang doesn't exist.

Not sure if that'd break something so posted here for comment, perhaps we'd want a mechanism for build scripts to select compiler favor preferences? I.e. there might be some projects that just need gcc

NobodyXu avatar Oct 08 '25 12:10 NobodyXu

To make IPO easier, I think cc-rs should try to use clang as default and only fallback to gcc/cc if clang doesn't exist.

I for one would argue that it's unreasonable. However, as already mentioned, the most natural thing to do is to leave the choice to perform per-crate IPO to crate maintainer(*). An optional cc::Build method that enables it. It would be reasonable for such a method to favour clang. But it would be inappropriate for cc-rs to panic if clang is unavailable. The choice between terminating the build script, issuing a warning or simply proceeding should be left to the crate maintainer. As for CC environment variable, I'd argue that it should still have the ultimate priority and override the said preference.

A remark about "fallback to gcc/cc." Keep in mind that there are systems where cc is clang, for example FreeBSD, MacOS X...

dot-asm avatar Oct 09 '25 09:10 dot-asm

But it would be inappropriate for cc-rs to panic if clang is unavailable

We will fallback to gcc/cc if clang is not found, and we will only allow IPO if the compiler used is clang

A remark about "fallback to gcc/cc." Keep in mind that there are systems where cc is clang, for example FreeBSD, MacOS X...

Yes, that's why we have a compiler family detection .c for this

NobodyXu avatar Oct 09 '25 10:10 NobodyXu

But it would be inappropriate for cc-rs to panic if clang is unavailable

We will fallback to gcc/cc if clang is not found, and we will only allow IPO if the compiler used is clang

And the suggestion is to convey this information to the build script. I.e. if crate maintainer asks for IPO and cc-rs can't switch to clang, it should be up to the maintainer how to proceed, ignore it, issue a "clang would benefit you, dear user" warning or panic.

dot-asm avatar Oct 09 '25 10:10 dot-asm

I am thinking of something like this:

#[non_exhaustive{
enum LTOMode {
     Disabled,
    /// if -Clinker-plugin-lto passed, then try to use clang to generate ir bitcode
    ///
    /// If -Clto is passed, try to use clang to do lto but generate normal assembly
    #[default{
    Auto,
}

Build scripts can disable it if LTO breaks their build, and we can add more mode for customisation in the future, while the default will work for most crates

NobodyXu avatar Oct 09 '25 11:10 NobodyXu

Build scripts can disable it if LTO breaks their build

No. Per-crate IPO should be opt-in, not opt-out.

Let me clarify my position. You mentioned -Clto. Since we know that build script doesn't have access to flags cargo puts together based on Cargo.toml content, the only way for the build script to "see" -Clto is through RUSTFLAGS environment variable. Relying on users to set RUSTFLAGS is kind of unsustainable. For starters they don't listen when you tell them. Secondly, and more importantly, since it affect all dependent crates, it's not given that it would be appreciated/appropriate. Instead the proposition is to empower individual crate maintainers. In other words if a crate maintainer deems IPO beneficial for their crate, they should be able to opt-in through the build script alone.

Just in case. -Clinker-plugin-lto is different, because it only affects the "top-most" binary crate, not all the dependencies.

dot-asm avatar Oct 09 '25 11:10 dot-asm

Just in case. -Clinker-plugin-lto is different, because it only affects the "top-most" binary crate, not all the dependencies.

On the second thought this is not true. It does affect all dependencies. What is different is that affects the final linking step.

dot-asm avatar Oct 09 '25 12:10 dot-asm

In other words if a crate maintainer deems IPO beneficial for their crate, they should be able to opt-in through the build script alone.

We can start with that being default, but I don't think it's a good position.

When users use --release and enable lto in profile.release they do expect all crates to be optimized somewhat, including external C/C++

The -Clinker-plugin-lto, is merely a hack due to how rustc handles lto and linking, and imho the linking process should be changed so that rustc can perform cross lang LTO without having to use this flag, in other words, crates' LTO across crates should apply to C/C++ crates as well

https://internals.rust-lang.org/t/idea-easier-cross-lang-lto/23568/6

Edit: updated link

NobodyXu avatar Oct 09 '25 12:10 NobodyXu

When users use --release and enable lto in profile.release they do expect all crates to be optimized somewhat, including external C/C++

As already mentioned, build script doesn't have access to this information. In other words in order to meet these expectations you need to start from cargo.

dot-asm avatar Oct 09 '25 14:10 dot-asm

Hmmm... It appears that I've based some of my suggestions on wrong assumption that lto flag in Cargo.toml is per-crate. I.e. if a dependency crate has lto in its Cargo.toml, IPO would be applied to the corresponding .rlib. This doesn't seem to be true. It's lto flag in the "top-most" Cargo.toml that gets applied to the dependencies. This is likely to make you go "what is he talking about?" Apologies. I wonder if it was always like this? Or maybe rather if there is will to change it to per-crate? Because it's more natural to let the crate maintainers assess how IPO benefits their code, rather than rely on an application they've never heard of to enable it...

dot-asm avatar Oct 09 '25 14:10 dot-asm

IIRC the default cargo behavior is to do LTO intra-crate

And I can see why LTO is global, it's hard to think of applying LTO only to a subset of crates when doing inter-crate LTO

NobodyXu avatar Oct 10 '25 08:10 NobodyXu

IIRC the default cargo behavior is to do LTO intra-crate

??? Default is no lto. You probably mean if one enables LTO, then the default is intra-crate. Well, it has to be because otherwise it wouldn't qualify as link-time. Then there is "thin" and "fat". "Thin" is still intra-crate, but is parallelized. I suppose it picks few branches in the dependency tree and works on several lumps in parallel...

And I can see why LTO is global, it's hard to think of applying LTO only to a subset of crates when doing inter-crate LTO

Why not? Imagine a dumb program that orchestrates multiple ultra-sophisticated crates, each with a few less sophisticated dependencies. It would make more sense to perform IPO on the ultra-sophisticated crates and their dependencies [and std]. And my assumption was that this decision could be made on a crate level, as opposed to the final binary. It's only natural for crate maintainers to have a say about IPO on their crates...

dot-asm avatar Oct 10 '25 12:10 dot-asm

Anyway, the suggestion in https://github.com/rust-lang/cc-rs/issues/1565#issuecomment-3337160136 is effectively per-crate IPO albeit isolated to C. If one doesn't value language-specific per-crate IPO, then it's no point in implementing the suggestion. Then recall that we found no way to cross language boundaries except for -Clinker-plugin-lto. It might change, or might have already, but even then one kind of has to start with cargo to let it inform the build script about lto setting in the top-most Cargo.toml.

dot-asm avatar Oct 10 '25 13:10 dot-asm

Yeah we can have a new fn ipo(&mut self, ipo: bool) function to enable and opt-in to it.

Let's start with this opt-in method then once we implement and get to test it, we can decided if we want more.

NobodyXu avatar Oct 10 '25 14:10 NobodyXu

IIRC the default cargo behavior is to do LTO intra-crate

??? Default is no lto.

As it turns out I'm both right and wrong. Well, I'm wrong according to Rust, but right according to me :-) But on a more serious note, as it turns out the default is what they call "thin local LTO," which is really "per-crate [language-specific] IPO." Formally speaking it shouldn't be called LTO, because it's not performed at link time.

dot-asm avatar Oct 11 '25 09:10 dot-asm

According to Noratrieb, thin LTO indeed contains additional information: function summaries

If the final linking LTO process uses thin LTO then we must also use thin Lto, otherwise if it is fat LTO, then we could use fat LTO (and maybe thin LTO?)

However it seems that rustc doesn't have a flag to control that, so I guess it would depend on linker default, plus additional flags users passed?

NobodyXu avatar Oct 11 '25 14:10 NobodyXu