nll-rfc
nll-rfc copied to clipboard
[lifetimes] lifetime-liveness of drops of uninitialized values
Moved from a comment on #40 and modified.
@nikomatsakis had discovered the following interaction:
fn foo() {
let x = Box::new(RefCell::new(42));
let ref_x: &'α RefCell<u32> = &x;
let inner_ref: Ref<'α, u32> = RefCell::borrow(ref_x);
// ^ for all you guys following me, `Ref` maintains a borrow to its parent
// `RefCell` that it clears during its very-not-blind destructor.
std::mem::drop(inner_ref);
// (†)
std::mem::drop(x); // free the box
maybe_unwind();
// (‡)
}
This code is legal only of 'α
is not required to be live at (†)
. There are no explicit uses of 'α
after inner_ref
is dropped, so it might seem like 'α
wouldn't be live at †
, but there are 2 MIR drops for inner_ref
, that can both use ref_x
and keep it live:
- at end of scope at point
‡
- at the unwind path, which can be reached from
maybe_unwind
Dynamically speaking, inner_ref
is in an uninitialized state after each of the drops, so both drops are no-ops and this program is sound.
However, if we go according to liveness requirements according to the current dropck rules, both of these implicit MIR drops will force 'α
to be live at (†)
and cause the conflict.
If we care about it, a simple dataflow analysis could discover that the drop at ‡
is always dead.
The drop at the unwind path looks like it might be a hard nut to crack - there's only 1 unwind path for the function, ending at the single resume
terminator. However, I think there's a trick we can use - if a value is uninitialized, then I'm quite sure it can never be live. We could backwards-propagate α live if VARS are initialized
dataflow facts, and combine them with forwards-propagated VARS is maybe-initialized
facts.
(To be clear, in the example that @arielb1 added, the drop
is something more like std::mem::drop(inner_ref)
, and not the MIR drop operation. Or else I am confused.)
The drop at the unwind path looks like it might be a hard nut to crack - there's only 1 unwind path for the function, ending at the single resume terminator. However, I think there's a trick we can use - if a value is uninitialized, then I'm quite sure it can never be live. We could backwards-propagate α live if VARS are initialized dataflow facts, and combine them with forwards-propagated VARS is maybe-initialized facts.
This is roughly the same analyses that we do for drop elaboration, right? I was wondering if it would make sense to think about separating out the initialization analysis and drop elaboration from the rest of borrow check. In other words, we would first check that everything is initialized when it is used (where a move counts as a de-initialization). We would then elaborate drops. We could then run borrow-check on this elaborated form, with the analysis assuming that all values are initialized when used (even if it's not always obvious). Would this suffice? I guess I have to experiment and check out what MIR results.
(To be clear, in the example that @arielb1 added, the drop is something more like std::mem::drop(inner_ref), and not the MIR drop operation. Or else I am confused.)
Sure, because you can't write a MIR drop.
This is roughly the same analyses that we do for drop elaboration, right? I was wondering if it would make sense to think about separating out the initialization analysis and drop elaboration from the rest of borrow check. In other words, we would first check that everything is initialized when it is used (where a move counts as a de-initialization).
No. The MIR generated is (taking a parameter to make the MIR simpler):
fn foo(x: Box<RefCell<u32>>) {
bb1:
let ref_x: &'α RefCell<u32> = &x;
let inner_ref: Ref<'α, u32> = RefCell::borrow(ref_x) -> bb2 [unwind=bbU1]
bb2:
// This wasn't present in the original, but I expected it - turns
// out to be somewhat of a complication
maybe_unwind() -> bb3 [unwind=bb2U]
bb3:
// ^ for all you guys following me, `Ref` maintains a borrow to its parent
// `RefCell` that it clears during its very-not-blind destructor.
std::mem::drop(inner_ref) -> bb4 [unwind=bbU2]
bb4:
// (†)
std::mem::drop(x) -> bb5 [unwind=bbU2] // free the box
bb5:
maybe_unwind() -> bb6 [unwind=bbU2]
bb6:
// (‡)
DROP inner_ref // this drop is deleted by drop elaboration
DROP x // this drop is deleted by drop elaboration
ret = () // don't nitpick
return
bbU2:
// this drop can be called from both bb2 & bb4,
// and therefore can be both live and dead, and
// therefore we create a drop flag.
//
// Yes, this is an inefficiency, but *I think* LLVM
// can get rid of it pretty quickly in SROA+SimplifyCFG.
DROP inner_ref -> bbU1
bbU1:
// same, but less important for us.
DROP x -> bbU0
bbU0:
resume
}
@arielb1 ok, so the problem is that in bbU2, we have incoming branches from both bb2 (where the DROP would occur) and bb4 (where it would not)? In other words, if we were to do a more "thorough" drop elaboration, we'd be ok here (not saying that is the impl strategy we should take necessarily, I can imagine it would lead to explosions in the graph or something like that).
@nikomatsakis
Sure.
@arielb1 so I was talking about this issue with @pnkfelix and it got me thinking. It seems to me that if we do a simple is-initialized analysis, we will have the same problem that the "elaborate drops" pass does: the unwinding path both before and after invoking mem::drop(x)
is the same, and hence we get into a "maybe initialized" state.
I was thinking about how we might fix this. The most obvious thing is to generate more unwind paths, but that's going to lead to worse compile time.
I think what you were saying is that, when we merge in bits from a successor, if we see that the value may be dropped, and we also know that it is definitely not initialized, we can clear those bits? If so, that makes sense. (I guess this means that the "may drop" bits from bbU2 would be ignored by bb4, but not bb2).
Seems like something we could readily prototype.
@nikomatsakis
I had some odd dataflow-ish solution in mind, which I could try to write in more detail. I believe it means we can do this sanely in a sane amount of time.
@arielb1 ok -- I was imagining we might do a "must be moved" analysis first, and then use the results of that to inform the liveness (i.e., if a variable must be moved at some point X, we don't care if it is live or if it may be dropped).