ldc
ldc copied to clipboard
nested lambdas not inlined when outer functions uses `pragma(inline, false)`
In the following example, bar is inlined in foo2, foo3 and foo4, but not in foo1. Moving pragma(inline, false); inside instead of having it before the function also allows inlining of bar to work.
pragma(inline, false)
int foo1() {
alias bar = (int s) => s;
return bar(3);
}
int foo2() {
alias bar = (int s) => s;
return bar(3);
}
pragma(inline, false)
int foo3() {
int bar(int s) { return s; }
return bar(3);
}
int foo4() {
int bar(int s) { return s; }
return bar(3);
}
plugged it in to godbolt with -O3:
int example.foo1():
.Lfunc_begin0:
mov edi, 3
jmp pure nothrow @nogc @safe int example.foo1().__lambda1(int)@PLT
.Ltmp0:
.Lfunc_end0:
pure nothrow @nogc @safe int example.foo1().__lambda1(int):
.Lfunc_begin1:
mov eax, edi
.Ltmp1:
ret
.Ltmp2:
.Lfunc_end1:
int example.foo2():
.Lfunc_begin2:
mov eax, 3
ret
.Ltmp3:
.Lfunc_end2:
pure nothrow @nogc @safe int example.foo2().__lambda1(int):
.Lfunc_begin3:
mov eax, edi
.Ltmp4:
ret
.Ltmp5:
.Lfunc_end3:
int example.foo3():
.Lfunc_begin4:
mov eax, 3
ret
.Ltmp6:
.Lfunc_end4:
pure nothrow @nogc @safe int example.foo3().bar(int):
.Lfunc_begin5:
mov eax, esi
.Ltmp7:
ret
.Ltmp8:
.Lfunc_end5:
int example.foo4():
.Lfunc_begin6:
mov eax, 3
ret
.Ltmp9:
.Lfunc_end6:
pure nothrow @nogc @safe int example.foo4().bar(int):
.Lfunc_begin7:
mov eax, esi
.Ltmp10:
ret
.Ltmp11:
.Lfunc_end7:
See also https://forum.dlang.org/post/[email protected]
I'd have to check it, but I assume the scope-pragma(inline) is simply forwarded to all nested functions/lambdas (and so a frontend thing).
I find it baffling that foo3.bar is inlined; both foo3 and foo3.bar get the expected noinline attribute. It's not inlined anymore when making its body minimally less trivial (return s + 123;).
indeed. Whatever the rules should be, it doesn't seem they are being followed as the results are quite inconsistent.
I would argue that noinline should not be inherited by inner functions. Why would I want that?
As foo3.bar is the simplest bijective mapping it's fairly easy prey for interprocedural analysis - I'm not all that familiar with LLVM but that could be why it does it anyway.
But it doesn't do it for the lambda in foo1, although the IR for them is almost identical when making foo3.bar static (no context pointer); the only difference I've spotted is the (default) external linkage for foo3.bar vs. weakonce_odr for foo1.__lambda1. I guess this is just a little LLVM issue though and not really a problem in practice.
Wrt. inheriting the pragma(inline) scope: well, I guess it's not unlikely one wants to inherit it for pragma(inline, true). Otherwise the nested functions are guaranteed not to be inlined when emitting/inlining the outer function in non-owning object files. For false one probably wants it not to be inherited, but I expect false usage coupled with nested functions to be very rare. Using the statement form pragma(inline, false); works in those cases as you found out. (But I strongly discourage using the statement form for pragma(inline, true);; the function body needs semantic analysis for it to get detected...)
The only reason I every use pragma(inline, true/false) is because I want to control whether a specific function should be inlined. I still don't understand why anyone would expect it to have any transitive properties.
You say "Otherwise the nested functions are guaranteed not to be inlined when emitting/inlining the outer function in non-owning object files". That sounds like also the opposite of what one would expect. Any function - nested or not - being inlined or not should be done when the compiler sees fit unless I have marked it explicitly one way or another.
I am sensing that in order to follow this, I need to understand not what inlining is (which I understand pretty well) but specifically how inlining happens in ldc/llvm, which suggests the abstraction provided by these pragmas is pretty leaky.
Am I completely missing something obvious here?
Any function - nested or not - being inlined or not should be done when the compiler sees fit unless I have marked it explicitly one way or another. [...] I need to understand not what inlining is (which I understand pretty well) but specifically how inlining happens in ldc/llvm
LDC doesn't do cross-module inlining by default, and -enable-cross-module-inlining is problematic - e.g., see the failures in https://github.com/ldc-developers/ldc/pull/3753. That's because it's not trivial, as we have to a) codegen the function manually multiple times into different IR modules (~object files) - with a different IR linkage for non-owning CUs, see available_externally in https://llvm.org/docs/LangRef.html#linkage-types -, and b) based on some heuristic when to do that extra work (currently based on number of statements IIRC). The extra codegen comes with potential linker conflicts for nested aggregates, functions and globals; e.g., see https://github.com/ldc-developers/ldc/issues/3548. Such 'secondary' available_externally emissions into non-owning CUs would need to apply the special IR linkage transitively to all symbols defined in the function body, including init symbols/vtables/TypeInfos for nested aggregates etc. C++/clang doesn't have to deal with this extra complexity. ;)
The reason for these extra emissions is that LLVM optimizes an IR module/CU at a time (without LTO of course). All the optimizer sees is the IR in that module, so if the codegen'd D module references an imported one-liner, LDC has to manually codegen that imported function, quite likely involving extra semantic analysis in the frontend, and deal with the available_externally complexity.
IIRC, these available_externally definitions don't show up in IR generated via -output-ll, making it hard to give you a little example. It boils down to the IR definition only being available for inlineability; ~~if it's not inlined,~~ the function is stripped and never makes it to the produced object file.
Edit: Oh and this of course also applies to pragma(inline, true) functions in case they are referenced in other modules codegen'd into another CU. We use it selectively for some functions in druntime/Phobos, mainly wrappers around LLVM intrinsics, to kind-of enforce cross-module inlining selectively for functions where we definitely know it's worth it.
Edit2: pragma(inline, true) functions also get an according LLVM alwaysinline function attribute, making LLVM inline it in as good as all cases AFAICT (even with disabled optimizer). Similarly, pragma(inline, false) translates to noinline.
You say "Otherwise the nested functions are guaranteed not to be inlined when emitting/inlining the outer function in non-owning object files". That sounds like also the opposite of what one would expect.
I don't recall exactly, but I think regular nested functions (as opposed to function literals) are prevented from being defined in secondary CUs while codegen'ing the outer function as available_externally - via skipCodegen(). I doubt that was the original intention, but it's also a workaround for LDC not applying available_externally transitively AFAIK and thus according potential for linker errors. Similarly, globals defined as part of a 'secondary' emission are skipped (only declared) at the moment.
The specifics of the llvm stuff are a bit beyond me, but I ran in to this again and it's quite frustrating.
Currently I have this situation:
A calls B calls C.
Ldc gives me this with -O3: B is inlined in to A, C is not inlined.
What I want: B is not inlined, C is inlined in to B.
is there any way to suggest / enforce this?
What I want: B is not inlined, C is inlined in to B.
is there any way to suggest / enforce this?
Can you be a little more specific? You can get this behavior by explicitly mentioning pragma(inline, false) for B, and pragma(inline, true) for C, but I guess it's not working for you?
What I want: B is not inlined, C is inlined in to B. is there any way to suggest / enforce this?
Can you be a little more specific? You can get this behavior by explicitly mentioning
pragma(inline, false)for B, andpragma(inline, true)for C, but I guess it's not working for you?
wouldn't that lead to transitive pragma(inline, true) to anything C calls and also transitive pragma(inline, false) for anything else (other than C) that B calls?
What I want: B is not inlined, C is inlined in to B. is there any way to suggest / enforce this?
Can you be a little more specific? You can get this behavior by explicitly mentioning
pragma(inline, false)for B, andpragma(inline, true)for C, but I guess it's not working for you?wouldn't that lead to transitive
pragma(inline, true)to anything C calls and also transitivepragma(inline, false)for anything else (other than C) that B calls?
The pragma only applies to the function definition, not to call sites within that function.
pragma(inline, true) void f()
{
g(); // inlining of g() is not forced, but based on normal profitability analysis
}
void h()
{
f(); // inlining of f() is forced
}