rust
rust copied to clipboard
rustc_codegen_ssa: Use `llvm.invariant` intrinsics on arguments that are immutable references; off by default.
Optimization failures around reloads and memcpy optimizations are frequently traceable to LLVM's failure to prove that memory can't be mutated by a call or store. This problem is especially acute in Rust, where large values tend to be memcpy'd more often than in C++. Thankfully, Rust has stronger guarantees on mutability available than C++ does, via the strong immutability of &
references. This should allow LLVM to prove that memory can't be modified by stores and calls in more cases.
We're already using LLVM's readonly
parameter attribute on such calls. However, the semantics of readonly
are akin to const
in C++, in that they only promise to LLVM that the function won't mutate the parameter through that pointer, not that the pointed-to memory is immutable for the entire duration of the function. These weak semantics limit the applicability of readonly
to LLVM's alias analysis. Instead of readonly
, the correct way to express strong immutability guarantees on memory is through the llvm.invariant.start
and llvm.invariant.end
intrinsics. These enable a frontend like rustc
to describe immutability of memory regions in an expressive, flow-sensitive manner.
Unfortunately, LLVM doesn't use the llvm.invariant.start
and llvm.invariant.end
intrinsics for much at the moment. It's only used in one optimization in loop-invariant code motion at this time. Follow-up work will need to be done in LLVM to integrate these intrinsics into alias analysis. Possibly there will need to be some sort of "MemoryInvarianceAnalysis" that uses graph reachability algorithms to analyze the extent of the guarantees provided by these intrinsics to the control flow graph.
Regardless, this front-end work needs to happen as a prerequisite for any LLVM work, so that the improvements to LLVM can be measured and tested. So this commit makes rustc
use llvm.invariant
in a minimal way: on immutable references to "freeze" types (i.e. not transitively containing UnsafeCell) passed directly as parameters to functions. This is off by default, gated behind the non-default -Z emit-invariant-markers=yes
flag.
Obviously, a lot more can be done to use llvm.invariant
more liberally in the future, but this can be added over time, especially once more LLVM optimization passes use that infrastructure. This is simply the bare minimum for now. Once LLVM uses those intrinsics for more optimizations, the effects of more llvm.invariant
use can be measured more precisely.
Some changes occurred in compiler/rustc_codegen_gcc
cc @antoyo
r? @TaKO8Ki
(rust-highfive has picked a reviewer for you, use r? to override)
Another benefit of the conservatism here is that it avoids this PR depending too much on the semantics of unsafe code. Memory pointed to by immutable function parameter references to freezable memory is clearly immutable for the lifetime of the function regardless of what semantics we decide on for unsafe code; if they weren't it would very probably be UB already given that we mark such parameters as readonly
right now.
I'd suggest doing a perf run with this enabled by default.
TBH, I'm not sure this is a viable way forward. I expect that this is going to regress optimization potential in practice, because invariant.start is essentially unused for optimization purposes (the only thing clang uses it for is to convert globals into constants after evaluating global ctors). At the same time, invariant.start will introduce a ref effect and invariant.end a modref effect, and from a quick look it doesn't seem like code is able to look past these effectively. I think it's unlikely that use of invariant.start and invariant.end for optimization in LLVM will increase much in the future, because their control-flow dependence would make this expensive in terms of compile-time.
The more widely used way to represent invariance is invariant.group metadata in conjunction with launder.invariant.group intrinsics. The way to use these in this context would be to use launder.invariant.group to create a new pointer for the argument, and then annotate all uses of it using !invariant.group
. This representation is used for strict vtables in clang and does have non-trivial support in LLVM.
However, I believe that for Rust's use case, the way to get the most mileage out of this is to add an invariant
argument attribute. This will not negatively affect any optimizations and should be easy to handle directly on the AA level (by generalizing pointsToConstantMemory towards classifying memory into globally invariant and locally invariant, where the former results in no effects, while the latter is limited to ref effects). This has been on my backlog for a while, but I never got around to implementing it.
Actually, now that I think about it again: Doesn't the combination of noalias
and readonly
that we usually have already imply invariance within the function? I think AA should be able to handle that without any changes on the rustc side.
Actually, now that I think about it again: Doesn't the combination of noalias and readonly that we usually have already imply invariance within the function? I think AA should be able to handle that without any changes on the rustc side.
No, it doesn't. See: https://godbolt.org/z/Wxn3WWWfa
LLVM doesn't know that bar()
doesn't mutate *x
somehow, despite it being marked readonly
and noalias
in all functions. So it has to issue a needless reload of the value.
However, I believe that for Rust's use case, the way to get the most mileage out of this is to add an invariant argument attribute. This will not negatively affect any optimizations and should be easy to handle directly on the AA level (by generalizing pointsToConstantMemory towards classifying memory into globally invariant and locally invariant, where the former results in no effects, while the latter is limited to ref effects). This has been on my backlog for a while, but I never got around to implementing it.
I'd like to be able to use this on references that aren't directly function parameters. e.g.
struct Foo<'a> {
bar: &'a i32,
}
fn f(foo: &Foo) {
let bar = foo.bar;
g(*bar);
h(bar);
g(*bar);
}
should be optimized to avoid the redundant *bar
load. This seems to require more than just parameter metadata.
LLVM doesn't know that
bar()
doesn't mutate*x
somehow, despite it being markedreadonly
andnoalias
in all functions. So it has to issue a needless reload of the value.
I'm not saying it currently handles this (it doesn't), but that it would be easy to make it handle it, without introducing new IR concepts or changes on the rustc side.
I'm not saying it currently handles this (it doesn't), but that it would be easy to make it handle it, without introducing new IR concepts or changes on the rustc side.
OK, I've added that check to BasicAliasAnalysis::pointsToConstantMemory()
and it seems to work on a simple test case. Running tests overnight.
The more widely used way to represent invariance is invariant.group metadata in conjunction with launder.invariant.group intrinsics. The way to use these in this context would be to use launder.invariant.group to create a new pointer for the argument, and then annotate all uses of it using
!invariant.group
. This representation is used for strict vtables in clang and does have non-trivial support in LLVM.
I can do this, but do we need to launder the pointer first? The LangRef suggests that invariant.group
should work as long as every use of the SSA value loads the same value. Since in these cases the memory is immutable for the entire lifetime of the SSA value that seems safe. (Though that assumes the inliner correctly inserts launder intrinsics when inlining functions.)
I'm not saying it currently handles this (it doesn't), but that it would be easy to make it handle it, without introducing new IR concepts or changes on the rustc side.
OK, I've added that check to
BasicAliasAnalysis::pointsToConstantMemory()
and it seems to work on a simple test case. Running tests overnight.
The necessary change isn't quite that simple, and this will lead to miscompiles in some cases. Maybe it's good enough to get some performance data though.
I can do this, but do we need to launder the pointer first? The LangRef suggests that
invariant.group
should work as long as every use of the SSA value loads the same value. Since in these cases the memory is immutable for the entire lifetime of the SSA value that seems safe. (Though that assumes the inliner correctly inserts launder intrinsics when inlining functions.)
The inliner does not insert launder.invariant.group intrinsics, these need to be added explicitly.
The necessary change isn't quite that simple, and this will lead to miscompiles in some cases. Maybe it's good enough to get some performance data though.
OK, here's a rough proposed change: https://github.com/pcwalton/llvm-project/commit/87a571a3fc17880a28e3c4234d758024cb8a3a9e
I changed pointsToConstantMemory()
to take a fourth parameter, OrInvariant
, like the existing OrLocal
, and if that parameter is true then pointsToConstantMemory()
returns true for memory that is invariant for the life of the SSA value, even if that memory isn't invariant for the life of the program. I then passed true in cases where it looked safe (particularly getModRefInfo()
).
Does this approach seem reasonable? If so I'll continue cleaning it up, adding tests, etc. Having two bool parameters seems a bit ugly, admittedly, but I wanted to follow the existing code.
The new logic looks correct to me. I don't think we'd want to land this with the OrInvariant parameter, but it's a reasonable starting point for review.
The inliner does not insert launder.invariant.group intrinsics, these need to be added explicitly.
Wait, if we launder pointers every time we enter a function, doesn't that mean that redundant loads can't be eliminated across inlined functions? e.g.
fn f(x: &i32) {
use_of_val(*x);
g(x);
use_of_val(*x); // <--
}
fn g(y: &i32) {
use_of_ptr(y);
}
If codegen applies a launder intrinsic to y
on the entry block of g
and g
gets inlined into f
, then x
and y
are considered different, despite being the same pointer and immutable throughout, and the load from the second use_of_val()
(marked with an arrow above) can't be eliminated because LLVM will think use_of_ptr
might have mutated the memory pointed to by x
. Is this right?
David Goldblatt suggested an llvm.constify
intrinsic (obviously we would need a better name) that takes a pointer and returns a new pointer SSA value that aliases the original but is guaranteed to be invariant for that entire SSA value's lifetime. This would be different from llvm.launder.invariant.group
because calling constify
on an already constified pointer would be a no-op. To avoid proliferation of llvm.invariant
intrinsics I wonder if it would make sense to just replace llvm.invariant.start
with this new llvm.constify
. What do you think?
The new logic looks correct to me. I don't think we'd want to land this with the OrInvariant parameter, but it's a reasonable starting point for review.
I've given this some thought, and I think the right way to model this is by making the return value a ModRefInfo rather than bool, which is the upper bound on observable memory effects for this location. This is NoModRef for globally invariant memory and Ref for locally invariant memory. Users can then just &
this value to refine the ModRefInfo result.
To avoid proliferation of
llvm.invariant
intrinsics I wonder if it would make sense to just replacellvm.invariant.start
with this newllvm.constify
. What do you think?
Without commenting on the rest, I don't think llvm.invariant.start can be replaced, because the thing its primarily used for is to declare that the object cannot be modified past a certain point (i.e. invariant.start without invariant.end). In particular this is used to turn globals into constants after global ctor evaluation, in which case having a "constantified" SSA value doesn't really help us -- we need the invariant.start on the global independently of how it will later be used.
I think you are better suited to review this PR. r? @nikic
I will close this PR after the relevant LLVM patches are upstream, unless you'd like me to close it now.
I think it's safe to close this, I doubt we would want to pursue the particular approach implemented here.
I submitted https://reviews.llvm.org/D136659 to implement the suggested LLVM change.