rust icon indicating copy to clipboard operation
rust copied to clipboard

rustc_codegen_ssa: Use `llvm.invariant` intrinsics on arguments that are immutable references; off by default.

Open pcwalton opened this issue 1 year ago • 14 comments

Optimization failures around reloads and memcpy optimizations are frequently traceable to LLVM's failure to prove that memory can't be mutated by a call or store. This problem is especially acute in Rust, where large values tend to be memcpy'd more often than in C++. Thankfully, Rust has stronger guarantees on mutability available than C++ does, via the strong immutability of & references. This should allow LLVM to prove that memory can't be modified by stores and calls in more cases.

We're already using LLVM's readonly parameter attribute on such calls. However, the semantics of readonly are akin to const in C++, in that they only promise to LLVM that the function won't mutate the parameter through that pointer, not that the pointed-to memory is immutable for the entire duration of the function. These weak semantics limit the applicability of readonly to LLVM's alias analysis. Instead of readonly, the correct way to express strong immutability guarantees on memory is through the llvm.invariant.start and llvm.invariant.end intrinsics. These enable a frontend like rustc to describe immutability of memory regions in an expressive, flow-sensitive manner.

Unfortunately, LLVM doesn't use the llvm.invariant.start and llvm.invariant.end intrinsics for much at the moment. It's only used in one optimization in loop-invariant code motion at this time. Follow-up work will need to be done in LLVM to integrate these intrinsics into alias analysis. Possibly there will need to be some sort of "MemoryInvarianceAnalysis" that uses graph reachability algorithms to analyze the extent of the guarantees provided by these intrinsics to the control flow graph.

Regardless, this front-end work needs to happen as a prerequisite for any LLVM work, so that the improvements to LLVM can be measured and tested. So this commit makes rustc use llvm.invariant in a minimal way: on immutable references to "freeze" types (i.e. not transitively containing UnsafeCell) passed directly as parameters to functions. This is off by default, gated behind the non-default -Z emit-invariant-markers=yes flag.

Obviously, a lot more can be done to use llvm.invariant more liberally in the future, but this can be added over time, especially once more LLVM optimization passes use that infrastructure. This is simply the bare minimum for now. Once LLVM uses those intrinsics for more optimizations, the effects of more llvm.invariant use can be measured more precisely.

pcwalton avatar Oct 14 '22 23:10 pcwalton

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo

rustbot avatar Oct 14 '22 23:10 rustbot

r? @TaKO8Ki

(rust-highfive has picked a reviewer for you, use r? to override)

rust-highfive avatar Oct 14 '22 23:10 rust-highfive

Another benefit of the conservatism here is that it avoids this PR depending too much on the semantics of unsafe code. Memory pointed to by immutable function parameter references to freezable memory is clearly immutable for the lifetime of the function regardless of what semantics we decide on for unsafe code; if they weren't it would very probably be UB already given that we mark such parameters as readonly right now.

pcwalton avatar Oct 15 '22 00:10 pcwalton

I'd suggest doing a perf run with this enabled by default.

TBH, I'm not sure this is a viable way forward. I expect that this is going to regress optimization potential in practice, because invariant.start is essentially unused for optimization purposes (the only thing clang uses it for is to convert globals into constants after evaluating global ctors). At the same time, invariant.start will introduce a ref effect and invariant.end a modref effect, and from a quick look it doesn't seem like code is able to look past these effectively. I think it's unlikely that use of invariant.start and invariant.end for optimization in LLVM will increase much in the future, because their control-flow dependence would make this expensive in terms of compile-time.

The more widely used way to represent invariance is invariant.group metadata in conjunction with launder.invariant.group intrinsics. The way to use these in this context would be to use launder.invariant.group to create a new pointer for the argument, and then annotate all uses of it using !invariant.group. This representation is used for strict vtables in clang and does have non-trivial support in LLVM.

However, I believe that for Rust's use case, the way to get the most mileage out of this is to add an invariant argument attribute. This will not negatively affect any optimizations and should be easy to handle directly on the AA level (by generalizing pointsToConstantMemory towards classifying memory into globally invariant and locally invariant, where the former results in no effects, while the latter is limited to ref effects). This has been on my backlog for a while, but I never got around to implementing it.

nikic avatar Oct 15 '22 08:10 nikic

Actually, now that I think about it again: Doesn't the combination of noalias and readonly that we usually have already imply invariance within the function? I think AA should be able to handle that without any changes on the rustc side.

nikic avatar Oct 15 '22 08:10 nikic

Actually, now that I think about it again: Doesn't the combination of noalias and readonly that we usually have already imply invariance within the function? I think AA should be able to handle that without any changes on the rustc side.

No, it doesn't. See: https://godbolt.org/z/Wxn3WWWfa

LLVM doesn't know that bar() doesn't mutate *x somehow, despite it being marked readonly and noalias in all functions. So it has to issue a needless reload of the value.

However, I believe that for Rust's use case, the way to get the most mileage out of this is to add an invariant argument attribute. This will not negatively affect any optimizations and should be easy to handle directly on the AA level (by generalizing pointsToConstantMemory towards classifying memory into globally invariant and locally invariant, where the former results in no effects, while the latter is limited to ref effects). This has been on my backlog for a while, but I never got around to implementing it.

I'd like to be able to use this on references that aren't directly function parameters. e.g.

struct Foo<'a> {
    bar: &'a i32,
}

fn f(foo: &Foo) {
    let bar = foo.bar;
    g(*bar);
    h(bar);
    g(*bar);
}

should be optimized to avoid the redundant *bar load. This seems to require more than just parameter metadata.

pcwalton avatar Oct 15 '22 09:10 pcwalton

LLVM doesn't know that bar() doesn't mutate *x somehow, despite it being marked readonly and noalias in all functions. So it has to issue a needless reload of the value.

I'm not saying it currently handles this (it doesn't), but that it would be easy to make it handle it, without introducing new IR concepts or changes on the rustc side.

nikic avatar Oct 15 '22 09:10 nikic

I'm not saying it currently handles this (it doesn't), but that it would be easy to make it handle it, without introducing new IR concepts or changes on the rustc side.

OK, I've added that check to BasicAliasAnalysis::pointsToConstantMemory() and it seems to work on a simple test case. Running tests overnight.

pcwalton avatar Oct 15 '22 10:10 pcwalton

The more widely used way to represent invariance is invariant.group metadata in conjunction with launder.invariant.group intrinsics. The way to use these in this context would be to use launder.invariant.group to create a new pointer for the argument, and then annotate all uses of it using !invariant.group. This representation is used for strict vtables in clang and does have non-trivial support in LLVM.

I can do this, but do we need to launder the pointer first? The LangRef suggests that invariant.group should work as long as every use of the SSA value loads the same value. Since in these cases the memory is immutable for the entire lifetime of the SSA value that seems safe. (Though that assumes the inliner correctly inserts launder intrinsics when inlining functions.)

pcwalton avatar Oct 15 '22 10:10 pcwalton

I'm not saying it currently handles this (it doesn't), but that it would be easy to make it handle it, without introducing new IR concepts or changes on the rustc side.

OK, I've added that check to BasicAliasAnalysis::pointsToConstantMemory() and it seems to work on a simple test case. Running tests overnight.

The necessary change isn't quite that simple, and this will lead to miscompiles in some cases. Maybe it's good enough to get some performance data though.

I can do this, but do we need to launder the pointer first? The LangRef suggests that invariant.group should work as long as every use of the SSA value loads the same value. Since in these cases the memory is immutable for the entire lifetime of the SSA value that seems safe. (Though that assumes the inliner correctly inserts launder intrinsics when inlining functions.)

The inliner does not insert launder.invariant.group intrinsics, these need to be added explicitly.

nikic avatar Oct 15 '22 11:10 nikic

The necessary change isn't quite that simple, and this will lead to miscompiles in some cases. Maybe it's good enough to get some performance data though.

OK, here's a rough proposed change: https://github.com/pcwalton/llvm-project/commit/87a571a3fc17880a28e3c4234d758024cb8a3a9e

I changed pointsToConstantMemory() to take a fourth parameter, OrInvariant, like the existing OrLocal, and if that parameter is true then pointsToConstantMemory() returns true for memory that is invariant for the life of the SSA value, even if that memory isn't invariant for the life of the program. I then passed true in cases where it looked safe (particularly getModRefInfo()).

Does this approach seem reasonable? If so I'll continue cleaning it up, adding tests, etc. Having two bool parameters seems a bit ugly, admittedly, but I wanted to follow the existing code.

pcwalton avatar Oct 15 '22 15:10 pcwalton

The new logic looks correct to me. I don't think we'd want to land this with the OrInvariant parameter, but it's a reasonable starting point for review.

nikic avatar Oct 15 '22 16:10 nikic

The inliner does not insert launder.invariant.group intrinsics, these need to be added explicitly.

Wait, if we launder pointers every time we enter a function, doesn't that mean that redundant loads can't be eliminated across inlined functions? e.g.

fn f(x: &i32) {
    use_of_val(*x);
    g(x);
    use_of_val(*x); // <--
}

fn g(y: &i32) {
    use_of_ptr(y);
}

If codegen applies a launder intrinsic to y on the entry block of g and g gets inlined into f, then x and y are considered different, despite being the same pointer and immutable throughout, and the load from the second use_of_val() (marked with an arrow above) can't be eliminated because LLVM will think use_of_ptr might have mutated the memory pointed to by x. Is this right?

David Goldblatt suggested an llvm.constify intrinsic (obviously we would need a better name) that takes a pointer and returns a new pointer SSA value that aliases the original but is guaranteed to be invariant for that entire SSA value's lifetime. This would be different from llvm.launder.invariant.group because calling constify on an already constified pointer would be a no-op. To avoid proliferation of llvm.invariant intrinsics I wonder if it would make sense to just replace llvm.invariant.start with this new llvm.constify. What do you think?

pcwalton avatar Oct 15 '22 21:10 pcwalton

The new logic looks correct to me. I don't think we'd want to land this with the OrInvariant parameter, but it's a reasonable starting point for review.

I've given this some thought, and I think the right way to model this is by making the return value a ModRefInfo rather than bool, which is the upper bound on observable memory effects for this location. This is NoModRef for globally invariant memory and Ref for locally invariant memory. Users can then just & this value to refine the ModRefInfo result.

To avoid proliferation of llvm.invariant intrinsics I wonder if it would make sense to just replace llvm.invariant.start with this new llvm.constify. What do you think?

Without commenting on the rest, I don't think llvm.invariant.start can be replaced, because the thing its primarily used for is to declare that the object cannot be modified past a certain point (i.e. invariant.start without invariant.end). In particular this is used to turn globals into constants after global ctor evaluation, in which case having a "constantified" SSA value doesn't really help us -- we need the invariant.start on the global independently of how it will later be used.

nikic avatar Oct 16 '22 20:10 nikic

I think you are better suited to review this PR. r? @nikic

TaKO8Ki avatar Oct 18 '22 08:10 TaKO8Ki

I will close this PR after the relevant LLVM patches are upstream, unless you'd like me to close it now.

pcwalton avatar Oct 18 '22 23:10 pcwalton

I think it's safe to close this, I doubt we would want to pursue the particular approach implemented here.

nikic avatar Oct 19 '22 07:10 nikic

I submitted https://reviews.llvm.org/D136659 to implement the suggested LLVM change.

pcwalton avatar Oct 25 '22 03:10 pcwalton