rust
rust copied to clipboard
Panic on invalid usages of `MaybeUninit::uninit().assume_init()`
This MIR transform inserts the same validity checks from mem::{uninitialized,zeroed}
to MaybeUninit::{uninit,zeroed}().assume_init()
.
We have been panicking in mem::uninit
on invalid values for quite some time now, and it has helped to get people off the unsound API and towards using MaybeUninit<T>
.
While correct usage of MaybeUninit<T>
is clearly documented, some people still use it incorrectly and simply replaced their wrong mem::uninit
usage with MaybeUninit::uninit().assume_init()
. This is not any more correct than the old version, and we should still emit panics in these cases. As this can't be done in the library only, we need this MIR pass to insert the calls.
For now, it only detects direct usages of MaybeUninit::uninit().assume_init()
but it could be extended in the future to do more advanced dataflow analysis.
r? @michaelwoerister
(rust-highfive has picked a reviewer for you, use r? to override)
Hey! It looks like you've submitted a new PR for the library teams!
If this PR contains changes to any rust-lang/rust
public library APIs then please comment with @rustbot label +T-libs-api -T-libs
to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.
Examples of T-libs-api
changes:
- Stabilizing library features
- Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
- Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
- Changing public documentation in ways that create new stability guarantees
- Changing observable runtime behavior of library APIs
Some changes occurred to MIR optimizations
cc @rust-lang/wg-mir-opt
The job mingw-check
failed! Check out the build log: (web) (plain)
Click to see the possible cause of the failure (guessed by this bot)
configure: rust.debug-assertions := True
configure: rust.overflow-checks := True
configure: llvm.assertions := True
configure: dist.missing-tools := True
configure: build.configure-args := ['--enable-sccache', '--disable-manage-submodu ...
configure: writing `config.toml` in current directory
configure:
configure: run `python /checkout/x.py --help`
Attempting with retry: make prepare
---
* highest error code: E0790
Found 506 error codes
Found 0 error(s) in error codes
Done!
tidy error: /checkout/compiler/rustc_mir_transform/src/lib.rs:246: XXX is deprecated; use FIXME
tidy error: /checkout/compiler/rustc_mir_transform/src/check_maybe_uninit.rs:141: XXX is deprecated; use FIXME
some tidy checks failed
Build completed unsuccessfully in 0:00:10
Also gonna leave a linkback to https://github.com/rust-lang/rust/issues/66151 which this PR is relevant to... Kinda!
Heads up, with zstd-seekable 0.1.7
, RUSTFLAGS="-Zstrict-init-checks" cargo test
just hangs for me (in test, not in build). Not sure if you can reproduce. No idea what it's doing.
r? rust-lang/mir
Heads up, with
zstd-seekable 0.1.7
,RUSTFLAGS="-Zstrict-init-checks" cargo test
just hangs for me (in test, not in build). Not sure if you can reproduce. No idea what it's doing.
Did some investigation with @Nilstrieb and it seems to be their problem. We're inserting panics in a place they're not panic safe. They deadlock. Happens on stable if you insert a panic manually.
This will take a little time as I don't have that much time right now, so @rustbot author
Honestly I am not sure this is the right move. With a panic like this, the chances are pretty high that they will hit people that can do absolutely nothing about them -- because the panic is caused by a dependency. And having to implement this via a MIR pass sounds like the kind of hack that will be quite annoying to maintain.
I think I'd be more in favor of using a lint for this. And note that the invalid_values
lint already detects MaybeUninit::uninit().assume_init()
. Also IIRC @5225225 had some great ideas for how to make assume_init
panic on invalid bit patterns, which can be enabled by a -Z
flag.
One advantage the panics have over a lint is the lint needs to work pre-monomorphisation, so we can't lint for making an uninit T
, because that might be fine.
And the lint warns about dead code, which is arguably a benefit (don't need to actually test the code, you get immediate feedback).
The panics run where we know the type layout concretely, so we can accurately panic even in cases of zeroed enums where the variant with discriminant zero doesn't allow being left zeroed. That would be very hard, if not impossible, to use a lint for.
That's not to say our lints couldn't be improved, but I think the panics serve a useful purpose by saying you did do a UB. Both are useful in different circumstances, which is why I think we should have both.
I've addressed all the comments, but #101061 will force me to change some of the intrinsics used here if it's merged.
This brings me to a question: Should we use strict-init-checks
by default here? We guarantee nothing about assume_init
, so panicking on everything but unions/zsts should be allowed. Especially when we might enable noundef
on integers, this could stop LLVM UB.
The job x86_64-gnu-llvm-13
failed! Check out the build log: (web) (plain)
Click to see the possible cause of the failure (guessed by this bot)
Some tests failed in compiletest suite=mir-opt mode=mir-opt host=x86_64-unknown-linux-gnu target=x86_64-unknown-linux-gnu
..........
failures:
---- [mir-opt] src/test/mir-opt/check-maybe-uninit.rs stdout ----
2 + // MIR for `main` after CheckMaybeUninit
4 | User Type Annotations
4 | User Type Annotations
- | 0: user_ty: Canonical { max_universe: U0, variables: [CanonicalVarInfo { kind: Ty(General(U0)) }], value: TypeOf(DefId(2:2022 ~ core[4f75]::mem::maybe_uninit::{impl#2}::uninit), UserSubsts { substs: [^0], user_self_ty: Some(UserSelfTy { impl_def_id: DefId(2:2019 ~ core[4f75]::mem::maybe_uninit::{impl#2}), self_ty: std::mem::MaybeUninit<u8> }) }) }, span: $DIR/check-maybe-uninit.rs:6:17: 6:42, inferred_ty: fn() -> std::mem::MaybeUninit<u8> {std::mem::MaybeUninit::<u8>::uninit}
- | 1: user_ty: Canonical { max_universe: U0, variables: [CanonicalVarInfo { kind: Ty(General(U0)) }], value: TypeOf(DefId(2:2022 ~ core[4f75]::mem::maybe_uninit::{impl#2}::uninit), UserSubsts { substs: [^0], user_self_ty: Some(UserSelfTy { impl_def_id: DefId(2:2019 ~ core[4f75]::mem::maybe_uninit::{impl#2}), self_ty: std::mem::MaybeUninit<std::string::String> }) }) }, span: $DIR/check-maybe-uninit.rs:7:17: 7:46, inferred_ty: fn() -> std::mem::MaybeUninit<std::string::String> {std::mem::MaybeUninit::<std::string::String>::uninit}
+ | 0: user_ty: Canonical { max_universe: U0, variables: [CanonicalVarInfo { kind: Ty(General(U0)) }], value: TypeOf(DefId(2:2022 ~ core[8554]::mem::maybe_uninit::{impl#2}::uninit), UserSubsts { substs: [^0], user_self_ty: Some(UserSelfTy { impl_def_id: DefId(2:2019 ~ core[8554]::mem::maybe_uninit::{impl#2}), self_ty: std::mem::MaybeUninit<u8> }) }) }, span: $DIR/check-maybe-uninit.rs:6:17: 6:42, inferred_ty: fn() -> std::mem::MaybeUninit<u8> {std::mem::MaybeUninit::<u8>::uninit}
+ | 1: user_ty: Canonical { max_universe: U0, variables: [CanonicalVarInfo { kind: Ty(General(U0)) }], value: TypeOf(DefId(2:2022 ~ core[8554]::mem::maybe_uninit::{impl#2}::uninit), UserSubsts { substs: [^0], user_self_ty: Some(UserSelfTy { impl_def_id: DefId(2:2019 ~ core[8554]::mem::maybe_uninit::{impl#2}), self_ty: std::mem::MaybeUninit<std::string::String> }) }) }, span: $DIR/check-maybe-uninit.rs:7:17: 7:46, inferred_ty: fn() -> std::mem::MaybeUninit<std::string::String> {std::mem::MaybeUninit::<std::string::String>::uninit}
8 fn main() -> () {
8 fn main() -> () {
9 let mut _0: (); // return place in scope 0 at $DIR/check-maybe-uninit.rs:+0:11: +0:11
thread '[mir-opt] src/test/mir-opt/check-maybe-uninit.rs' panicked at 'Actual MIR output differs from expected MIR output /checkout/src/test/mir-opt/check_maybe_uninit.main.CheckMaybeUninit.diff', src/tools/compiletest/src/runtest.rs:3516:25
failures:
[mir-opt] src/test/mir-opt/check-maybe-uninit.rs
With a panic like this, the chances are pretty high that they will hit people that can do absolutely nothing about them -- because the panic is caused by a dependency.
If you replace "panic" in your statement with "UB", is the meaning much different? The dependency is behaving badly, which is hard for its users, but the panic at least makes it deterministic.
Also, it's worth thinking about if we're trying to prevent just LLVM UB here, or rust UB? I agree that being minimal in our checks for mem::uninitialized
is useful because that function is what Old Code does, which is why we added the 0x01 filling, but can/should we be stricter and go "we will panic on rust UB" for MaybeUninit, since we sorted out our messaging (mostly) for what uninit means?
(We did say "uhhhh uninit ints may be fine, they may not, be careful regardless", which has since been changed to "they're UB.")
If it's rust UB, then strict-init-checks is the correct thing to do here.
If it's LLVM UB, then we need to still panic on uninit bool
, but not behind arrays. We wouldn't panic on uninit u8
until we start adding noundef
to ints (which it would be a shame if we couldn't do).
If you replace "panic" in your statement with "UB", is the meaning much different?
In theory, no. In practice, yes -- in particular when it comes to uninit, a lot of code still does what it is intended to do even if it has UB.
We should do all we can to fix that UB, but this PR does not help a lot with that -- fixing UB can only happen by the author of the crate that contains the bad code.
This PR also proposes a very expensive way to detect such UB (expensive in maintenance cost) -- a dedicated MIR pass to detect certain ways of using standard library APIs and rewrite them into a different from. This breaks several abstraction barriers. MIR passes should not be coupled to the standard library, they should worry about the Rust Abstract machine semantics and nothing else. I worry about the long-term maintainability of such "random rewrites", and currently I don't think we should do something like that.
But in the end I won't have to maintain this code, so this decision is up to @rust-lang/wg-mir-opt. I just hope I won't be on the receiving end of the bugreports we will get when we have accumulated a dozen of these rewrites and someone tries to debug something and gets extremely confused because the generated MIR just makes no sense.
I do find the "we insert calls" bit to be rather surprising, indeed.
Though I do think the general concept of "change the code during compilation to detect more UB" is something we should support. Other things we could do along the same lines, that would have the same downside of making the resulting MIR not match the input code, include "check type validity on pointer reads" or "check pointer alignment on read".
What I'm thinking of is having a set of sanitizers that are on by default, with the default set depending on profile, and letting you disable them if you wanted.
So something like RUSTFLAGS="-Cmir-sanitizer=uninit_assume_init,pointer_align" cargo build
(or a similar thing in your Cargo.toml [profile.dev]
. If it's just A Done Thing that "yeah, your code will be modified sometimes to insert UB checks, use this flag to disable that/only enable some of them", I don't think it would be too bad?
Also, could use a fuel mechanism like optimizations do (maybe the same fuel? these aren't required transformations, they're just not optimizations) in order to let you bisect bad transformations.
Basically, I think these shouldn't be ad-hoc changes that are impossible to control, but I think we should have the infrastructure to do changes like this, since they're useful.
Yeah, this pass could be seen as a very simple MaybeUninit sanitizer. But I agree that this is quite some complexity without precedent, and I am not sure whether it's worth it either. I think it's a good idea, which is why I wrote it in the first place, but in the end the choice is up to wg-mir-opt, and maybe going with more general MIR sanitizers like 522 mentioned is a better idea.
If we're talking about sanitizers, we could also change MaybeUninit
to have an additional field behind #[cfg(debug_assertions)]
that is used to track initialization and panics if you haven't used any of the &mut self
methods before asserting intializedness.
That has zero impact on release mode, and doesn't require much maintenance
This would break, we guarantee it to be transparent over T
. If I read a T
in memory as a MaybeUninit<T>
, which is allowed as it's transparent, it would do an out of bounds read of the init flag.
Yeah, MaybeUninit
has several slice methods that need matching layout.
Though I do think the general concept of "change the code during compilation to detect more UB" is something we should support. Other things we could do along the same lines, that would have the same downside of making the resulting MIR not match the input code, include "check type validity on pointer reads" or "check pointer alignment on read".
Those are very different. We can compile each raw pointer read to a checked operation. That is a correct compilation scheme, and perfectly compositional -- the behavior of a large expression is entirely determined by its subexpressions.
What I am objecting to is having the behavior of MaybeUninit::uninit().assume_init()
be something other than first calling MaybeUninit::uninit()
and then calling assume_init()
on the result. That is non-compositional, and surprising & confusing.
If we're talking about sanitizers, we could also change
MaybeUninit
to have an additional field behind#[cfg(debug_assertions)]
that is used to track initialization and panics if you haven't used any of the&mut self
methods before asserting intializedness.That has zero impact on release mode, and doesn't require much maintenance
The C sanitizers use "shadow memory" for this, where each stack/heap allocation has additional metadata at a predictable address that can track initialization status etc. It would be interesting to have that infrastructure for MIR somehow.
This PR is also being discussed on Zulip now. Quoting some of the things I said there:
"I'd much rather see effort spent towards having a version of assume_init that checks the invariants it can check, independent of what happened to the MaybeUninit before this method was called." "[This] PR only covers the most extreme least-likely-to-surprise-people case so I dont think it makes enough of a dent in UB prevention to justify its cost. It would be different if it would be able to detect common subtle cases of UB, but it doesn't."
:umbrella: The latest upstream changes (presumably #102051) made this pull request unmergeable. Please resolve the merge conflicts.
I'm waiting for whether we want to go forward with this before I'm cleaning this up further
ping from triage: @Nilstrieb
I'm waiting for whether we want to go forward with this before I'm cleaning this up further
This was over a month ago. Can you please post your status on this PR?
FYI: when a PR is ready for review, send a message containing
@rustbot ready
to switch to S-waiting-on-review
so the PR is in the reviewer's backlog.
This needs a decision by someone, but I don't know who? I guess it'd be the lang team for whether we want such a panic, and the compiler team for whether it is worth the maintenance burden?
Dear @rust-lang/compiler @rust-lang/lang,
this PR suggests to make MaybeUninit::uninit().assume_init()
panic when it is called on a type that does not permit being uninitialized, similar to what we already do for mem::uninitialized()
. The way this is implemented is via a MIR transform that detects this pattern (MaybeUninit::assume_init
called on the return value of MaybeUninit::uninit
), check the type, and rewrite it into a panic if applicable.
I am bringing this up for nomination because this is the first proposal for a MIR transform whose goal it is to improve run-time UB detection. So far, we have MIR transforms for optimizations, and we have assertions in the standard library for run-time UB detection. So I am asking:
- Compiler team, are you okay with having such MIR transforms around whose purpose is not optimization but UB detection?
- Lang team, do you think using a MIR transform for UB detection is a sensible thing to do?
I personally am concerned about the surprise that might be cause by these rewrites. Basically this means that the behavior of MaybeUninit::uninit().assume_init()
is no longer determined by first calling uninit
and then assume_init
, there is now a non-compositional rewrite being applied that will sometimes change program behavior. (Naturally, the transform cannot catch all cases of MaybeUninit::assume_init
being called on the return value of MaybeUninit::uninit
.) I imagine if we land this, it would not be the last MIR transform of that kind.
On the other hand, it does help detect some UB, after all.