rust icon indicating copy to clipboard operation
rust copied to clipboard

Runtime code is generated for functions only used in const eval

Open Kimundi opened this issue 1 year ago β€’ 11 comments

Code

I tried this code:

  • https://godbolt.org/z/M33K6afz8
  • https://play.rust-lang.org/?version=beta&mode=release&edition=2018&gist=095cc1d85dfd25b5dd1bc742fbf98432

I expected to see this happen: Both check1 and check2 should compile to functions that just use a compiletime-evaluated struct value

Instead, this happened: check1 disapeared from the assembly, and it looks like all the code needed to runtime-evaluate it got added instead

The weird thing is that the function that regressed is the one that uses a const item by name, rather than the one that just calls a const function inside a non-const function body.

Version it worked on

It most recently worked on: 1.74.1 (stable as of writing this)

Version with regression

from playground:

1.76.0-beta.1 (2023-12-21 0e09125c6c3c2fd70d7d)
1.77.0-nightly (2023-12-21 3d0e6bed600c0175628e)

Kimundi avatar Dec 22 '23 12:12 Kimundi

Note that adding #[no_mangle] to check1 to force it to appear in the binary does cause it to appear as a constant. I think this might be a case of "in isolation, codegen is weird, you have to evaluate it at a whole program level"?

asquared31415 avatar Dec 22 '23 13:12 asquared31415

This looks like another instance of #116505, meaning that depending on some heuristics leaf functions are only codegenned in cgu that need them. Since check1 and check2 are not called, they are not seen. This is good for optimizations but quite unfortunate for inspecting asm on Compiler Explorer.

Adding #[no_mangle] will force them to be available, you can also add -C link-dead-code=yes, or turn of the optimization with -Z cross-crate-inline-threshold=never on nightly.

https://godbolt.org/z/4rfGGrbWx

RossSmyth avatar Dec 22 '23 14:12 RossSmyth

I think there are multiple things happening here. I'll try to explain what I'm seeing.

I expected to see this happen: Both check1 and check2 should compile to functions that just return a compiletime-evaluated struct value

That's not possible, those functions both take runtime parameters. I'm presuming you mean "use a compiletime-evaluated struct value" instead?

check1 disapeared from the assembly

This is the cross-crate-inlining effect; I've been mulling over ideas to improve the situation, but this is not a bug. It's very unfortunate though. https://github.com/compiler-explorer/compiler-explorer/issues/5782

and it looks like all the code needed to runtime-evaluate it got added instead... The weird thing is that the function that regressed is the one that uses a const item by name, rather than the one that just calls a const function inside a non-const function body.

This I think is a bug. I've made a smaller example here: https://godbolt.org/z/GoEa8jc5f and I'm going to look into it...

saethlin avatar Dec 23 '23 02:12 saethlin

The reproducer is this:

pub struct Thing;

impl Thing {
    const fn new(panic: bool) -> Self {
        if panic {
            panic!()
        }
        Thing
    }
}

const T: Thing = Thing::new(false);

pub fn oof() -> Thing {
    T
}

Since automatic cross-crate-inlining landed, the assembly generated for this crate contains Thing::new and a bunch of symbols related to panics. Which looks very goofy. At first I was afraid that I broke something in that PR (https://github.com/rust-lang/rust/pull/116505) related to reachability, but I'm now quite sure I didn't. On an old compiler, if you just add #[inline] to fn oof, you'll get the goofy assembly in godbolt: https://godbolt.org/z/q4E6s4jvs

The change here is that fn oof is cross-crate-inlinable, which means that reachability analysis must consider the contents of its body to be reachable outside this CGU. This is a dark corner of the whole "automatic #[inline]" or "cross-crate-inlinable" approach that I thought I had dodged in the original PR by only inferring cross-crate-inlinability for leaf functions. Turns out I didn't. In this program, fn oof is a leaf function in MIR because it looks like this:

fn oof() -> Thing {
    let mut _0: Thing;

    bb0: {
        _0 = const _;
        return;
    }
}

But to reachability analysis, this function has now made the initializer of that const reachable. Ergo, to reachability analysis this is not a leaf function.

So cross-crate-inlining turns the surface area of this crate from "a symbol called oof that I can call and nothing else" to "a symbol called oof and also its body and everything reachable from it". So the compiler dutifully recurses through the const fn that initializes this const and instantiates everything that it reaches.

So from one perspective, this is sensible. But from the OP's, this is obviously silly. All this extra panic code was made reachable from compile time evaluation, but we stamped out runtime code for it. So I wonder if that's what we should be tracking here; if reachability determines something is reachable but only at compile time.

saethlin avatar Dec 23 '23 04:12 saethlin

I expected to see this happen: Both check1 and check2 should compile to functions that just return a compiletime-evaluated struct value

That's not possible, those functions both take runtime parameters. I'm presuming you mean "use a compiletime-evaluated struct value" instead?

Err yes, that was badly formulated on my part. Fixing it in the description.

check1 disapeared from the assembly

This is the cross-crate-inlining effect; I've been mulling over ideas to improve the situation, but this is not a bug. It's very unfortunate though. compiler-explorer/compiler-explorer#5782

That fully explains it, I was not aware that we actually started cross-crate inlining for non-marked functions now. This bugreport was created based on the assumption that that does not happen for public library functions.

and it looks like all the code needed to runtime-evaluate it got added instead... The weird thing is that the function that regressed is the one that uses a const item by name, rather than the one that just calls a const function inside a non-const function body.

This I think is a bug. I've made a smaller example here: https://godbolt.org/z/GoEa8jc5f and I'm going to look into it...

Thanks! Good to know I discovered something unexpected at least πŸ˜„

Kimundi avatar Dec 23 '23 10:12 Kimundi

So I wonder if that's what we should be tracking here; if reachability determines something is reachable but only at compile time.

That seems like what this ought to boil down to, yeah

Kimundi avatar Dec 23 '23 11:12 Kimundi

So I wonder if that's what we should be tracking here; if reachability determines something is reachable but only at compile time.

Interesting, I guess currently this is entirely per-item and does not consider "how the item is used"? Since we distinguish compile-time-MIR and runtime MIR, it would indeed make sense to treat those two rather separately for reachability.

Cc @oli-obk

RalfJung avatar Jan 11 '24 12:01 RalfJung

I started working on this and made some progress, but then was immediately tripped up by the difference between a const which captures a function pointer and a const which calls a function. That's probably resolvable, but I have enough other things on my plate right now that I'm unassigning myself to encourage anyone else interested to work on this.

saethlin avatar Jan 11 '24 15:01 saethlin

Having recently looked into the collector a bit, I have no idea how this is happening. We're not recursing into the MIR body of a const, so I can't see how the body of T can affect which things get collected.

Maybe some other pass is adding junk to oof's required_const?

RalfJung avatar Mar 11 '24 21:03 RalfJung

There are these calls in reachable to visit_nested_body which are suspicious but I don't know if that affects mono item collection at all:

https://github.com/rust-lang/rust/blob/24f009c5e55d18c12563dd74681ca33b8a349936/compiler/rustc_passes/src/reachable.rs#L197-L202

Does anyone know anyone who understands our mono item collector?^^

RalfJung avatar Mar 11 '24 21:03 RalfJung

Per my comment above, I believe this is required to handle consts that capture function pointers. Those function pointers will be used at runtime, so they're runtime code reachable through a const. That pattern is used by the standard library's thread-local system, which is a common source of linker errors when I work on MirUsedCollector.

saethlin avatar Mar 11 '24 22:03 saethlin

That doesn't sound right. Function pointers are handled by walking the final value of the const, and if we see a function pointer in there we add it to the monomorphization list. (Pointers are special values -- they have provenance -- so we can detect them in the final value of a const pretty easily; this does not require a type-driven traversal.) There's the collect_alloc function for that. So this does not require looking at the MIR of the const either.

RalfJung avatar Mar 12 '24 06:03 RalfJung

Function pointers are handled by walking the final value of the const, and if we see a function pointer in there we add it to the monomorphization list.

since constants are reevaluated in every crate (unlike statics since recently), reachability needs to mark everything in its body as reachable. So I guess we should start to cache at least constants that can be evaluated (non-generic or where being generic doesn't matter), and then just not encode their MIR. Then reachability can skip these bodies

oli-obk avatar Mar 12 '24 06:03 oli-obk

Okay but does reachability even interact with monomorphization? So far I don't see how reachability could be responsible here.

Or do we actually codegen all functions that are "reachable"? In that case we probably have to distinguish "reachable in const code" and "reachable in runtime code".

RalfJung avatar Mar 12 '24 07:03 RalfJung

Or do we actually codegen all functions that are "reachable"? In that case we probably have to distinguish "reachable in const code" and "reachable in runtime code".

Yes. These are the "seed" set of items to be processed in monomorphization

oli-obk avatar Mar 12 '24 07:03 oli-obk

Can you point to the code for that? I looked for where he reachable_set query is called and can't see anything inside the monomorphization crate.

RalfJung avatar Mar 12 '24 07:03 RalfJung

I was mistaken. While I recently dug into this, I mixed up two things.

It is used separately (and I think we hope that monomorphization and reachability are in sync enough that we never end up with missing MIR or missing codegen items). It is also used for splitting the mono items into codegen units, not for actually coming up with the initial set.

oli-obk avatar Mar 12 '24 08:03 oli-obk

Okay, so if I understand correctly we still don't have an explanation for why these even get added to the monomorphization set. More research is needed...

RalfJung avatar Mar 12 '24 08:03 RalfJung

The reproducer is this:

FWIW this doesn't reproduce on latest nightly any more, the #[inline] has to be added again (making current nightly behave like old versions): https://godbolt.org/z/4vxsT4cT3

RalfJung avatar Mar 13 '24 14:03 RalfJung

Turns out reachability is involved, I think. Just very indirectly.

The key difference in the log of the good behavior (without #[inline]) vs the bad behavior (with #[inline]) is

β”‚ β”œβ”€β”rustc_monomorphize::collector::push_if_root def_id=DefId(0:6 ~ collect[a406]::{impl#0}::new)
β”‚ β”‚ β”œβ”€  0ms DEBUG rustc_monomorphize::collector found root
β”‚ β”‚ β”œβ”€β”rustc_monomorphize::collector::create_fn_mono_item instance=Instance { def: Item(DefId(0:6 ~ collect[a406]::{impl#0}::new)), args: [] }, source=no-location (#0)
β”‚ β”‚ β”‚ β”œβ”€  0ms DEBUG rustc_monomorphize::collector return=Spanned { node: Fn(Instance { def: Item(DefId(0:6 ~ collect[a406]::{impl#0}::new)), args: [] }), span: no-location (#0) }
β”‚ β”‚ β”œβ”€β”˜
β”‚ β”œβ”€β”˜

IOW, this function suddenly considers Thing::new to be a root:

https://github.com/rust-lang/rust/blob/d3514a036dc65c1d31ee5b1b4bd58b9be1edf5af/compiler/rustc_monomorphize/src/collector.rs#L1293-L1307

So is_reachable_non_generic switches from false to true. That boils down to a membership test on the set computed here, which is created by starting with the reachable_set computed here and then removing some stuff. reachable_set in turn recurses into inline functions here and into the MIR body of constants here.

So... I guess the question is, is reachable_set supposed to return only things that are runtime-reachable (in which case it is currently returning too much), or is it supposed to return everything that reachable in any fashion (in which case the collector using this as its criterion is wrong since there we only want to collect runtime-relevant things).

RalfJung avatar Mar 13 '24 15:03 RalfJung

So from one perspective, this is sensible. But from the OP's, this is obviously silly. All this extra panic code was made reachable from compile time evaluation, but we stamped out runtime code for it. So I wonder if that's what we should be tracking here; if reachability determines something is reachable but only at compile time.

So, looking at the comments I wrote in https://github.com/rust-lang/rust/pull/122769...

I think what happens is that since T is reachable (which is correct), it may be the case that T returns a function pointer to any function it mentions, recursively. So just in case T returns a function pointer to Thing::new, we mark Thing::new reachable.

Now, we could probably skip this for top-level const items, since we know we can always evaluate them to a value, and then we can just check the final value for any function pointers and only codegen those. (And the collector already has logic to find these function pointers.)

But for associated consts, this is a lot harder. I think we'd have to keep track (inside reachable computation) of whether the current item is const-reachable or full-reachable, and then if we are const-reachable that does not in general get added to the regular reachable set, until we find a function item mentioned in a way that it does not immediately get called -- those are then added as regular-reachable again. Then there are still examples where we'd generate runtime code for no reason but it'd be a lot less common. Fundamentally we can't fully avoid false positives here as we can't be sure which functions may be referenced via function pointers that actually leak into the final value of the const (and thus potentially into runtime code)

RalfJung avatar May 08 '24 18:05 RalfJung

we'd have to keep track (inside reachable computation) of whether the current item is const-reachable or full-reachable, and then if we are const-reachable that does not in general get added to the regular reachable set, until we find a function item mentioned in a way that it does not immediately get called -- those are then added as regular-reachable again.

Ah! I think this might explain why my attempt to implement const-reachable didn't work.

saethlin avatar May 08 '24 18:05 saethlin

Do you still have that branch somewhere? What I imagine would be completely internal to reachable.rs. At least according to the comments there, we encode ctfe MIR for everything anyway, so it's only runtime reachability we have to worry about.

But, surprisingly, a const item can make things runtime reachable.

RalfJung avatar May 08 '24 18:05 RalfJung

When a runtime function calls a private const fn, we don't have to consider that function to be const-reachable. So the naive approach outlined above could overestimate how much is const-reachable. The two reachable sets do interact though... I think it's a fixed point of a computation that goes like

  • All public functions and statics are runtime-reachable.
  • All public const fn and const are const-reachable.
  • If a runtime-reachable function that needs cross-crate MIR mentions some function or static, that other item is also runtime-reachable.
  • If a runtime-reachable function that needs cross-crate MIR mentions a const, that item is const-reachable.
  • If a const-reachable item mentions an item, that item is const-reachable.
  • If a const-reachable item mentions a function/static in a way that the function item itself (or a function pointer computed from it) could escape the current item, then that function/static is runtime-reachable.

RalfJung avatar May 08 '24 19:05 RalfJung

Do you still have that branch somewhere?

Yes, though I expect there isn't anything in it worth salvaging: https://github.com/rust-lang/rust/compare/master...saethlin:rust:const-reachability

saethlin avatar May 09 '24 02:05 saethlin

Okay so that computed two sets as result from the query. I don't know if that's necessary.

Regarding the analysis outlined above, I assume it is pretty rare that a private function is const for no reason, so it's probably okay to simplify this a bit:

  • All public items are runtime-reachable.
  • If a runtime-reachable function that needs cross-crate MIR mentions some item, that other item is also runtime-reachable.
  • If a runtime-reachable item is a const or const fn, it is const-reachable.
  • If a const-reachable item mentions an item, that item is const-reachable.
  • If a const-reachable item mentions a function/static in a way that the function item itself (or a function pointer computed from it) could escape the current item, then that function/static is runtime-reachable.

RalfJung avatar May 09 '24 07:05 RalfJung

On nightly, godbolt doesn't show any assembly any more with the original reproducer, thanks to https://github.com/rust-lang/rust/pull/122505.

However, this variant still generates assembly:

pub struct Thing;

impl Thing {
    const fn new(panic: bool) -> Self {
        if panic {
            panic!()
        }
        Thing
    }

    const C: Thing = Thing::new(false);
}

#[inline]
pub fn oof() -> Thing {
    Thing::C
}

RalfJung avatar Jun 11 '24 06:06 RalfJung