rust icon indicating copy to clipboard operation
rust copied to clipboard

Panic on invalid usages of `MaybeUninit::uninit().assume_init()`

Open Nilstrieb opened this issue 1 year ago • 23 comments

This MIR transform inserts the same validity checks from mem::{uninitialized,zeroed} to MaybeUninit::{uninit,zeroed}().assume_init().

We have been panicking in mem::uninit on invalid values for quite some time now, and it has helped to get people off the unsound API and towards using MaybeUninit<T>.

While correct usage of MaybeUninit<T> is clearly documented, some people still use it incorrectly and simply replaced their wrong mem::uninit usage with MaybeUninit::uninit().assume_init(). This is not any more correct than the old version, and we should still emit panics in these cases. As this can't be done in the library only, we need this MIR pass to insert the calls.

For now, it only detects direct usages of MaybeUninit::uninit().assume_init() but it could be extended in the future to do more advanced dataflow analysis.

Nilstrieb avatar Aug 11 '22 20:08 Nilstrieb

r? @michaelwoerister

(rust-highfive has picked a reviewer for you, use r? to override)

rust-highfive avatar Aug 11 '22 20:08 rust-highfive

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

rustbot avatar Aug 11 '22 20:08 rustbot

The job mingw-check failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
configure: rust.debug-assertions := True
configure: rust.overflow-checks := True
configure: llvm.assertions      := True
configure: dist.missing-tools   := True
configure: build.configure-args := ['--enable-sccache', '--disable-manage-submodu ...
configure: writing `config.toml` in current directory
configure: 
configure: run `python /checkout/x.py --help`
Attempting with retry: make prepare
---
* highest error code: E0790
Found 506 error codes
Found 0 error(s) in error codes
Done!
tidy error: /checkout/compiler/rustc_mir_transform/src/lib.rs:246: XXX is deprecated; use FIXME
tidy error: /checkout/compiler/rustc_mir_transform/src/check_maybe_uninit.rs:141: XXX is deprecated; use FIXME
some tidy checks failed
Build completed unsuccessfully in 0:00:10

rust-log-analyzer avatar Aug 11 '22 20:08 rust-log-analyzer

Also gonna leave a linkback to https://github.com/rust-lang/rust/issues/66151 which this PR is relevant to... Kinda!

5225225 avatar Aug 11 '22 20:08 5225225

Heads up, with zstd-seekable 0.1.7, RUSTFLAGS="-Zstrict-init-checks" cargo test just hangs for me (in test, not in build). Not sure if you can reproduce. No idea what it's doing.

5225225 avatar Aug 11 '22 22:08 5225225

r? rust-lang/mir

michaelwoerister avatar Aug 12 '22 08:08 michaelwoerister

Heads up, with zstd-seekable 0.1.7, RUSTFLAGS="-Zstrict-init-checks" cargo test just hangs for me (in test, not in build). Not sure if you can reproduce. No idea what it's doing.

Did some investigation with @Nilstrieb and it seems to be their problem. We're inserting panics in a place they're not panic safe. They deadlock. Happens on stable if you insert a panic manually.

5225225 avatar Aug 12 '22 18:08 5225225

This will take a little time as I don't have that much time right now, so @rustbot author

Nilstrieb avatar Aug 22 '22 17:08 Nilstrieb

Honestly I am not sure this is the right move. With a panic like this, the chances are pretty high that they will hit people that can do absolutely nothing about them -- because the panic is caused by a dependency. And having to implement this via a MIR pass sounds like the kind of hack that will be quite annoying to maintain.

I think I'd be more in favor of using a lint for this. And note that the invalid_values lint already detects MaybeUninit::uninit().assume_init(). Also IIRC @5225225 had some great ideas for how to make assume_init panic on invalid bit patterns, which can be enabled by a -Z flag.

RalfJung avatar Aug 27 '22 15:08 RalfJung

One advantage the panics have over a lint is the lint needs to work pre-monomorphisation, so we can't lint for making an uninit T, because that might be fine.

And the lint warns about dead code, which is arguably a benefit (don't need to actually test the code, you get immediate feedback).

The panics run where we know the type layout concretely, so we can accurately panic even in cases of zeroed enums where the variant with discriminant zero doesn't allow being left zeroed. That would be very hard, if not impossible, to use a lint for.

That's not to say our lints couldn't be improved, but I think the panics serve a useful purpose by saying you did do a UB. Both are useful in different circumstances, which is why I think we should have both.

5225225 avatar Aug 27 '22 16:08 5225225

I've addressed all the comments, but #101061 will force me to change some of the intrinsics used here if it's merged.

This brings me to a question: Should we use strict-init-checks by default here? We guarantee nothing about assume_init, so panicking on everything but unions/zsts should be allowed. Especially when we might enable noundef on integers, this could stop LLVM UB.

Nilstrieb avatar Sep 07 '22 20:09 Nilstrieb

The job x86_64-gnu-llvm-13 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
Some tests failed in compiletest suite=mir-opt mode=mir-opt host=x86_64-unknown-linux-gnu target=x86_64-unknown-linux-gnu
..........
failures:

---- [mir-opt] src/test/mir-opt/check-maybe-uninit.rs stdout ----
2 + // MIR for `main` after CheckMaybeUninit
4   | User Type Annotations
4   | User Type Annotations
-   | 0: user_ty: Canonical { max_universe: U0, variables: [CanonicalVarInfo { kind: Ty(General(U0)) }], value: TypeOf(DefId(2:2022 ~ core[4f75]::mem::maybe_uninit::{impl#2}::uninit), UserSubsts { substs: [^0], user_self_ty: Some(UserSelfTy { impl_def_id: DefId(2:2019 ~ core[4f75]::mem::maybe_uninit::{impl#2}), self_ty: std::mem::MaybeUninit<u8> }) }) }, span: $DIR/check-maybe-uninit.rs:6:17: 6:42, inferred_ty: fn() -> std::mem::MaybeUninit<u8> {std::mem::MaybeUninit::<u8>::uninit}
-   | 1: user_ty: Canonical { max_universe: U0, variables: [CanonicalVarInfo { kind: Ty(General(U0)) }], value: TypeOf(DefId(2:2022 ~ core[4f75]::mem::maybe_uninit::{impl#2}::uninit), UserSubsts { substs: [^0], user_self_ty: Some(UserSelfTy { impl_def_id: DefId(2:2019 ~ core[4f75]::mem::maybe_uninit::{impl#2}), self_ty: std::mem::MaybeUninit<std::string::String> }) }) }, span: $DIR/check-maybe-uninit.rs:7:17: 7:46, inferred_ty: fn() -> std::mem::MaybeUninit<std::string::String> {std::mem::MaybeUninit::<std::string::String>::uninit}
+   | 0: user_ty: Canonical { max_universe: U0, variables: [CanonicalVarInfo { kind: Ty(General(U0)) }], value: TypeOf(DefId(2:2022 ~ core[8554]::mem::maybe_uninit::{impl#2}::uninit), UserSubsts { substs: [^0], user_self_ty: Some(UserSelfTy { impl_def_id: DefId(2:2019 ~ core[8554]::mem::maybe_uninit::{impl#2}), self_ty: std::mem::MaybeUninit<u8> }) }) }, span: $DIR/check-maybe-uninit.rs:6:17: 6:42, inferred_ty: fn() -> std::mem::MaybeUninit<u8> {std::mem::MaybeUninit::<u8>::uninit}
+   | 1: user_ty: Canonical { max_universe: U0, variables: [CanonicalVarInfo { kind: Ty(General(U0)) }], value: TypeOf(DefId(2:2022 ~ core[8554]::mem::maybe_uninit::{impl#2}::uninit), UserSubsts { substs: [^0], user_self_ty: Some(UserSelfTy { impl_def_id: DefId(2:2019 ~ core[8554]::mem::maybe_uninit::{impl#2}), self_ty: std::mem::MaybeUninit<std::string::String> }) }) }, span: $DIR/check-maybe-uninit.rs:7:17: 7:46, inferred_ty: fn() -> std::mem::MaybeUninit<std::string::String> {std::mem::MaybeUninit::<std::string::String>::uninit}
8   fn main() -> () {
8   fn main() -> () {
9       let mut _0: ();                      // return place in scope 0 at $DIR/check-maybe-uninit.rs:+0:11: +0:11

thread '[mir-opt] src/test/mir-opt/check-maybe-uninit.rs' panicked at 'Actual MIR output differs from expected MIR output /checkout/src/test/mir-opt/check_maybe_uninit.main.CheckMaybeUninit.diff', src/tools/compiletest/src/runtest.rs:3516:25


failures:
    [mir-opt] src/test/mir-opt/check-maybe-uninit.rs

rust-log-analyzer avatar Sep 08 '22 02:09 rust-log-analyzer

With a panic like this, the chances are pretty high that they will hit people that can do absolutely nothing about them -- because the panic is caused by a dependency.

If you replace "panic" in your statement with "UB", is the meaning much different? The dependency is behaving badly, which is hard for its users, but the panic at least makes it deterministic.

cuviper avatar Sep 12 '22 18:09 cuviper

Also, it's worth thinking about if we're trying to prevent just LLVM UB here, or rust UB? I agree that being minimal in our checks for mem::uninitialized is useful because that function is what Old Code does, which is why we added the 0x01 filling, but can/should we be stricter and go "we will panic on rust UB" for MaybeUninit, since we sorted out our messaging (mostly) for what uninit means?

(We did say "uhhhh uninit ints may be fine, they may not, be careful regardless", which has since been changed to "they're UB.")

If it's rust UB, then strict-init-checks is the correct thing to do here.

If it's LLVM UB, then we need to still panic on uninit bool, but not behind arrays. We wouldn't panic on uninit u8 until we start adding noundef to ints (which it would be a shame if we couldn't do).

5225225 avatar Sep 12 '22 18:09 5225225

If you replace "panic" in your statement with "UB", is the meaning much different?

In theory, no. In practice, yes -- in particular when it comes to uninit, a lot of code still does what it is intended to do even if it has UB.

We should do all we can to fix that UB, but this PR does not help a lot with that -- fixing UB can only happen by the author of the crate that contains the bad code.

This PR also proposes a very expensive way to detect such UB (expensive in maintenance cost) -- a dedicated MIR pass to detect certain ways of using standard library APIs and rewrite them into a different from. This breaks several abstraction barriers. MIR passes should not be coupled to the standard library, they should worry about the Rust Abstract machine semantics and nothing else. I worry about the long-term maintainability of such "random rewrites", and currently I don't think we should do something like that.

But in the end I won't have to maintain this code, so this decision is up to @rust-lang/wg-mir-opt. I just hope I won't be on the receiving end of the bugreports we will get when we have accumulated a dozen of these rewrites and someone tries to debug something and gets extremely confused because the generated MIR just makes no sense.

RalfJung avatar Sep 13 '22 15:09 RalfJung

I do find the "we insert calls" bit to be rather surprising, indeed.

cuviper avatar Sep 13 '22 15:09 cuviper

Though I do think the general concept of "change the code during compilation to detect more UB" is something we should support. Other things we could do along the same lines, that would have the same downside of making the resulting MIR not match the input code, include "check type validity on pointer reads" or "check pointer alignment on read".

What I'm thinking of is having a set of sanitizers that are on by default, with the default set depending on profile, and letting you disable them if you wanted.

So something like RUSTFLAGS="-Cmir-sanitizer=uninit_assume_init,pointer_align" cargo build (or a similar thing in your Cargo.toml [profile.dev]. If it's just A Done Thing that "yeah, your code will be modified sometimes to insert UB checks, use this flag to disable that/only enable some of them", I don't think it would be too bad?

Also, could use a fuel mechanism like optimizations do (maybe the same fuel? these aren't required transformations, they're just not optimizations) in order to let you bisect bad transformations.

Basically, I think these shouldn't be ad-hoc changes that are impossible to control, but I think we should have the infrastructure to do changes like this, since they're useful.

5225225 avatar Sep 13 '22 17:09 5225225

Yeah, this pass could be seen as a very simple MaybeUninit sanitizer. But I agree that this is quite some complexity without precedent, and I am not sure whether it's worth it either. I think it's a good idea, which is why I wrote it in the first place, but in the end the choice is up to wg-mir-opt, and maybe going with more general MIR sanitizers like 522 mentioned is a better idea.

Nilstrieb avatar Sep 13 '22 17:09 Nilstrieb

If we're talking about sanitizers, we could also change MaybeUninit to have an additional field behind #[cfg(debug_assertions)] that is used to track initialization and panics if you haven't used any of the &mut self methods before asserting intializedness.

That has zero impact on release mode, and doesn't require much maintenance

oli-obk avatar Sep 14 '22 15:09 oli-obk

This would break, we guarantee it to be transparent over T. If I read a T in memory as a MaybeUninit<T>, which is allowed as it's transparent, it would do an out of bounds read of the init flag.

Nilstrieb avatar Sep 14 '22 15:09 Nilstrieb

Yeah, MaybeUninit has several slice methods that need matching layout.

cuviper avatar Sep 14 '22 15:09 cuviper

Though I do think the general concept of "change the code during compilation to detect more UB" is something we should support. Other things we could do along the same lines, that would have the same downside of making the resulting MIR not match the input code, include "check type validity on pointer reads" or "check pointer alignment on read".

Those are very different. We can compile each raw pointer read to a checked operation. That is a correct compilation scheme, and perfectly compositional -- the behavior of a large expression is entirely determined by its subexpressions.

What I am objecting to is having the behavior of MaybeUninit::uninit().assume_init() be something other than first calling MaybeUninit::uninit() and then calling assume_init() on the result. That is non-compositional, and surprising & confusing.

RalfJung avatar Sep 18 '22 16:09 RalfJung

If we're talking about sanitizers, we could also change MaybeUninit to have an additional field behind #[cfg(debug_assertions)] that is used to track initialization and panics if you haven't used any of the &mut self methods before asserting intializedness.

That has zero impact on release mode, and doesn't require much maintenance

The C sanitizers use "shadow memory" for this, where each stack/heap allocation has additional metadata at a predictable address that can track initialization status etc. It would be interesting to have that infrastructure for MIR somehow.

tavianator avatar Sep 18 '22 18:09 tavianator

This PR is also being discussed on Zulip now. Quoting some of the things I said there:

"I'd much rather see effort spent towards having a version of assume_init that checks the invariants it can check, independent of what happened to the MaybeUninit before this method was called." "[This] PR only covers the most extreme least-likely-to-surprise-people case so I dont think it makes enough of a dent in UB prevention to justify its cost. It would be different if it would be able to detect common subtle cases of UB, but it doesn't."

RalfJung avatar Sep 20 '22 09:09 RalfJung

:umbrella: The latest upstream changes (presumably #102051) made this pull request unmergeable. Please resolve the merge conflicts.

bors avatar Sep 26 '22 17:09 bors

I'm waiting for whether we want to go forward with this before I'm cleaning this up further

Nilstrieb avatar Sep 27 '22 18:09 Nilstrieb

ping from triage: @Nilstrieb

I'm waiting for whether we want to go forward with this before I'm cleaning this up further

This was over a month ago. Can you please post your status on this PR?

FYI: when a PR is ready for review, send a message containing @rustbot ready to switch to S-waiting-on-review so the PR is in the reviewer's backlog.

JohnCSimon avatar Nov 06 '22 02:11 JohnCSimon

This needs a decision by someone, but I don't know who? I guess it'd be the lang team for whether we want such a panic, and the compiler team for whether it is worth the maintenance burden?

RalfJung avatar Nov 06 '22 10:11 RalfJung

Dear @rust-lang/compiler @rust-lang/lang,

this PR suggests to make MaybeUninit::uninit().assume_init() panic when it is called on a type that does not permit being uninitialized, similar to what we already do for mem::uninitialized(). The way this is implemented is via a MIR transform that detects this pattern (MaybeUninit::assume_init called on the return value of MaybeUninit::uninit), check the type, and rewrite it into a panic if applicable.

I am bringing this up for nomination because this is the first proposal for a MIR transform whose goal it is to improve run-time UB detection. So far, we have MIR transforms for optimizations, and we have assertions in the standard library for run-time UB detection. So I am asking:

  • Compiler team, are you okay with having such MIR transforms around whose purpose is not optimization but UB detection?
  • Lang team, do you think using a MIR transform for UB detection is a sensible thing to do?

I personally am concerned about the surprise that might be cause by these rewrites. Basically this means that the behavior of MaybeUninit::uninit().assume_init() is no longer determined by first calling uninit and then assume_init, there is now a non-compositional rewrite being applied that will sometimes change program behavior. (Naturally, the transform cannot catch all cases of MaybeUninit::assume_init being called on the return value of MaybeUninit::uninit.) I imagine if we land this, it would not be the last MIR transform of that kind.

On the other hand, it does help detect some UB, after all.

RalfJung avatar Nov 06 '22 11:11 RalfJung