RFC: Add additional `inline` intents
This proposes adding #[inline(trampoline)] and #[inline(rarely)] to hint the compiler additional information about why you're marking the function for inlining, to help it hopefully make better choices without the overly-strong hammers of always or never.
I like this but am kind of mentally conflicted about the name "trampoline." I can't tell if it's a term that's already in use, or if it's something that was made up to match the metaphor.
Ultimately would be fine with the name and am not going to block this on bikeshedding, but, kind of wanted to express my thoughts in case someone else feels more strongly and isn't sure where others stand.
IIUC after this RFC we would have these 8 inline levels (including the rustc_* ones):
| Attribute | LLVM function attribute | MIR inliner effect |
|---|---|---|
#[rustc_force_inline = "reason"] |
alwaysinline |
compile error if cannot inline |
#[inline(always)] |
alwaysinline |
always inline |
#[inline] |
inlinehint |
-Zinline-mir-hint-threshold=100 |
| - | - | -Zinline-mir-threshold=50 |
#[inline(trampoline)] |
-? | forces caller_is_inline_forwarder? |
#[inline(rarely)] |
-? | an even lower threshold? |
#[rustc_no_mir_inline] |
- | never inline |
#[inline(never)] |
noinline |
never inline |
I like this but am kind of mentally conflicted about the name "trampoline." I can't tell if it's a term that's already in use, or if it's something that was made up to match the metaphor.
See https://en.wikipedia.org/wiki/Trampoline_(computing) -- it's used for lots of things. Probably the closest meaning to this one is the calling convention one, where a trampoline is a function with calling convention A that rearranges the stack/registers then calls another function with calling convention B.
If there's a better name, though, I'd be happy to switch it.
@kennytm There might be another if you count implicitly cross-crate-inline as different from #[inline].
We might be able to replace rustc_no_mir_inline with inline(rarely), though -- block it in mir inlining unless all the arguments are Operand::Const, say. That'd be enough for the two cases on my machine in library right now.
(Also, both rustc_no_mir_inlines are also marked #[inline], so I think they're actually LLVM inlinehint.)
Perhaps inline should work like the diagnostic:: namespace, such that an unrecognized intent is a warning & no-op instead of an error?
I feel like the In Combination section could do with a code example - I'm finding it hard to track exactly which attribute would go on what function. Also NonZero::new seems to come out of nowhere.
I find myself thinking about the mention of analysis paralysis as a downside. I'd really like to understand 1) to what extent you expect to see these used commonly in codebases, 2) to what extent these work better than e.g. our existing heuristics for recognizing trampolines, 3) some kind of examples of these helping with codegen or compilation time or similar.
Or, to put this a different way: I think I would not want to approve this without first seeing the results of some kind of compiler experiment showing what would improve if we marked a bunch of functions with this, and weighing that against the very real cost of having multiple new variations of inline for people to consider.
The current trampoline detection is extremely simple, as it's focused only on avoiding MIR bloat. It's essentially "if this is only two blocks -- the first being a call that returns to the second -- then this is a trampoline and cannot inline anything that makes that no longer true". That means that it it's not detected even in "simple" cases like "well we assert! something first", or where there's a dropped generic parameter, because that means more blocks which might increase the MIR size.
The place I think inline(trampoline) would be most useful is things like
https://github.com/rust-lang/rust/blob/ac91805f3179fc2225c60e8ccf5a1daa09d43f3d/library/core/src/slice/mod.rs#L3861-L3869
where it would be useful to encourage inlining the assert into the caller, even against the "but that might lead to MIR bloat" heuristic, rather than the normal behaviour of first trying (subject to other heuristics still) to inline ptr::swap_nonoverlapping into [T]::swap_with_slice even if that makes it too big to then inline into callers.
Similarly, https://github.com/rust-lang/rust/pull/136718 was experimenting with making slice::Iter::next be something like if T::IS_ZST { self.next_zst() } else { self.next_not_zst() }, which doesn't really work today because both next_zst and next_not_zst are both small enough that we end up inlining them, but then the combined thing is too big to inline, making it hard to optimize out that IS_ZST check. So it'd be nice to put a inline(trampoline) on that to make the ZST check nearly-certain to inline into the caller, which can then optimize it out and make it obviously good to inline the one next implementation it actually cared about. (TBH, could even try putting inline(discouraged) on the next_zst version because iterating over slices of things certain to be ZSTs is rare enough that it's probably not worth inlining that in generic MIR ever, but it would still be good for LLVM to inline it.)
Could we write a bunch of more complex heuristics as we identify more of these cases? Maybe. But it's be nice to just say what I mean directly in a bunch of them.
I think I would not want to approve this without first seeing the results of some kind of compiler experiment
I'd be happy to take the feedback in this thread as a "tentative interest; please make an experiment and we'll see".
(total novice here)
It seems confusing to be offered one possible reason to encourage inlining and no others. Why is trampoline chosen as a reason that is worth distinguishing and not others? Also, I tend to think that the "trampoline cases" should be the more easy cases for the compiler to decide to inline without hints. But if that were an inline hint blessed by the language, I might feel less inclined to trust the compiler by default. More broadly, I wonder if this feature encourages folks to add inline based on unverified intuition.
Also, I kinda worry about the gray area in defining trampoline - How many lines of code leading up to the "trampoline call" is too many? Some inline hints might not neatly map to a single reason, but this feature kinda suggests that they should.
Just a note that this vaguely reminds me of #[allow(..), reason=".."].
Alternative bikeshed colors for trampoline: wrapper, forwarding, delegate, delegating
Alternative bikeshed colors for
trampoline:wrapper,forwarding,delegate,delegating
Another trampoline bikeshed: #[inline(shallow)]?
Also, I tend to think that the "trampoline cases" should be the more easy cases for the compiler to decide to inline without hints.
The trick with the trampoline cases is not inlining callees inside the trampoline that might otherwise have been inlined. Not unlike marking the callee as #[cold], but helpful for things that aren't exactly cold.