C without UB is translated to Rust with UB
struct Foo {
int len;
};
void dec(Foo* f) {
f->len--;
}
void bar() {
Foo f = {5};
struct Foo *fp = &f;
dec(fp);
f.len = 6;
dec(fp);
}
translates to
#[derive(Copy, Clone)]
#[repr(C)]
pub struct Foo {
pub len: libc::c_int,
}
#[no_mangle]
pub unsafe extern "C" fn dec(mut f: *mut Foo) { (*f).len -= 1; }
#[no_mangle]
pub unsafe extern "C" fn bar() {
let mut f: Foo = { let mut init = Foo{len: 5 as libc::c_int,}; init };
let mut fp: *mut Foo = &mut f;
dec(fp);
f.len = 6 as libc::c_int;
dec(fp);
}
which when run under miri gives:
error: Undefined Behavior: no item granting read access to tag <untagged> found in borrow stack.
--> src/main.rs:10:49
|
10 | pub unsafe extern "C" fn dec(mut f: *mut Foo) { (*f).len -= 1; }
| ^^^^^^^^^^^^^ no item granting read access to tag <untagged> found in borrow stack.
|
= help: this indicates a potential bug in the program: it performed an invalid operation, but the rules it violated are still experimental
= help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
= note: inside `dec` at src/main.rs:10:49
note: inside `bar` at src/main.rs:17:5
--> src/main.rs:17:5
|
17 | dec(fp);
| ^^^^^^^
note: inside `main` at src/main.rs:21:14
--> src/main.rs:21:14
|
21 | unsafe { bar() }
| ^^^^^
= note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
= note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:137:18
= note: inside closure at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:66:18
= note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
= note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:381:40
= note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:345:19
= note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:382:14
= note: inside `std::rt::lang_start_internal` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:51:25
= note: inside `std::rt::lang_start::<()>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:65:5
This happens because the mutable borrow of f is invalidated when f.len is mutated.
This happens because the mutable borrow of f is invalidated when f.len is mutated.
Sounds like the fix will be non-trivial, if possible at all, since we try to perfectly preserve the semantics and structure (somewhat) of the original C program. Re-borrowing every value after every possible mutation is going to require a lot of additional code and might introduce new bugs.
Isn't the solution to just do let mut fp = &raw mut f; once raw references (or whatever they're called) are stabilized/nightly-available? This is a known issue with downcasting refs to ptrs in general
Isn't the solution to just do
let mut fp = &raw mut f;once raw references (or whatever they're called) are stabilized/nightly-available? This is a known issue with downcasting refs to ptrs in general
I thought so too but that doesn't seem to fix it: https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=20e0c25fd0bce187fb2ec03df11ff945
Personally, I'd bet MIRI just doesn't understand raw references yet. This is the exact problem they were intended to fix, as I understand it.
Personally, I'd bet MIRI just doesn't understand raw references yet. This is the exact problem they were intended to fix, as I understand it.
@RalfJung what do you think?
This is the exact problem they were intended to fix, as I understand it.
The RFC gives other motivations: taking pointers to unaligned structure fields (which you can't do via an intermediate reference) and to fields in structures at invalid addresses (e.g. at 0x0 to implement offsetof). On the other hand, it might solve the problem in this issue because of this part:
Lowering to MIR should not insert an implicit reborrow of
<place>in&raw mut <place>; that reborrow would assert validity and thus defeat the entire point. The borrow checker should do the usual checks on the place used in&raw, but can just ignore the result of this operation and the newly created "reference" can have any lifetime. When translating MIR to LLVM, nothing special has to happen as references and raw pointers have the same LLVM type anyway; the new operation behaves likeRef. When interpreting MIR in the Miri engine, the engine will know not to enforce any invariants on the raw pointer created by&raw.
Miri does understand raw pointers. However, Rust has aliasing rules that C does not have, and those are becoming a problem here.
Specifically, when you have a local variable f and write f.len = ..., then (like if you had used a mutable reference) this asserts to the compiler that f is the unique way to access this memory, so any outstanding other pointers (safe references or unsafe raw pointers) now become invalid. This is crucial to enable the compiler to perform optimizations based on these aliasing rules.
Isn't the solution to just do let mut fp = &raw mut f; once raw references (or whatever they're called) are stabilized/nightly-available? This is a known issue with downcasting refs to ptrs in general
Yes that sounds like the most consistent approach to me -- do that and then never use f again. If everything goes through raw pointers, there are no alias assumptions.
I thought so too but that doesn't seem to fix it:
That is still mixing raw with non-raw accesses. Here's the "raw-only" version.
std::ptr::addr_of_mut will be in Rust 1.51 which should allow fixing this.
Looks like std::ptr::addr_of and addr_of_mut got stabilized, so we should use them.
Hi @RalfJung! We're noticing some unexpected behavior with miri with regards to addr_of_mut! and &mut x as *mut _ casts, and we wanted to make sure we're understanding things correctly and/or if there are any bugs in miri, so we have a few questions/examples to clarify.
We're using this as scaffolding code:
use std::ptr::addr_of_mut;
static mut X: i32 = 1;
unsafe fn dec(x: *mut i32) {
*x -= 1;
}
And all the examples below are wrapped in unsafe {}.
- Why does
&mut xinvalidate mutable pointers toxbut&mut x as *mut _doesn't?
We noticed that this passes miri: playground
let mut x: i32 = 1;
let p = &mut x as *mut i32;
&mut x as *mut i32;
dec(p);
but not this: playground
let mut x: i32 = 1;
let p = &mut x as *mut i32;
let _a = &mut x;
dec(p);
In the &mut x as *mut _ case, why does creating the intermediary &mut x not invalidate mutable pointers to x? Or rather, in
playground
let a = &mut x; // Invalidates `p`
let _b = a as *mut i32; // Seemingly revalidates it
why does the &mut x invalidate p and then as *mut i32 revalidate it? We would've thought addr_of_mut! would be needed here to avoid creating the intermediary &mut x.
- Why does
addr_of_mut!not work in some cases where&mut x as *mut _does?
We noticed that while addr_of_mut! works fine for local variables, like this:
playground
let mut x: i32 = 1;
let p = &mut x as *mut i32;
*addr_of_mut!(x) = 2;
dec(p);
it doesn't work for static variables, like this: playground
let p = &mut X as *mut i32;
*addr_of_mut!(X) = 2;
dec(p);
Also, for static variables, it works as long as the addr_of_mut! does not come after a &mut x as *mut _. That is,
playground
let p = &mut X as *mut i32;
*addr_of_mut!(X) = 2;
dec(p);
doesn't work, but these 3 other orderings do: playground
let p = addr_of_mut!(X);
*addr_of_mut!(X) = 2;
dec(p);
let p = addr_of_mut!(X);
*(&mut X as *mut i32) = 2;
dec(p);
let p = &mut X as *mut i32;
*(&mut X as *mut i32) = 2;
dec(p);
Is this the correct behavior, and if it is, could you explain what exactly is happening? Or is &mut x as *mut _ working in more cases than it should? Or is addr_of_mut! working in fewer cases than it should? And why are static variables behaving differently from local variables?
Thanks for all the help!
Many of your questions, I think, are related to the fact that Miri, by default, treats raw pointers as "untagged". This is necessary to accept a bunch of real-world code out there that does nasty things with integer-pointer casts, but can lead to very surprising behavior. We are actively working on fixing this, but we are not there yet.
So, what happens when you run all of your examples with -Zmiri-tag-raw-pointers? Unfortunately, this is not possible on the playground, so it is more annoying to explore. Your first example then fails, as you (probably?) expected:
let mut x: i32 = 1;
let p = &mut x as *mut i32;
&mut x as *mut i32;
dec(p);
shows
error: Undefined Behavior: attempting a read access using <3072> at alloc1653[0x0], but that tag does not exist in the borrow stack for this location
--> example.rs:6:5
|
6 | *x -= 1;
| ^^^^^^^
| |
| attempting a read access using <3072> at alloc1653[0x0], but that tag does not exist in the borrow stack for this location
| this error occurs as part of an access at alloc1653[0x0..0x4]
|
= help: this indicates a potential bug in the program: it performed an invalid operation, but the rules it violated are still experimental
= help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <3072> was created by a retag at offsets [0x0..0x4]
--> example.rs:12:9
|
12 | let p = &mut x as *mut i32;
| ^^^^^^
help: <3072> was later invalidated at offsets [0x0..0x4]
--> example.rs:13:1
|
13 | &mut x as *mut i32;
| ^^^^^^
= note: inside `dec` at example.rs:6:5
If there is still surprising behavior that you see with this flag, please let me know. :)
Thanks so much for the help! When I run all the examples with -Zmiri-tag-raw-pointers, the only one that still passes is the one that uses addr_of_mut! everywhere:
let p = addr_of_mut!(X);
*addr_of_mut!(X) = 2;
dec(p);
which is what I was expecting.
That's great to hear. :) Once https://github.com/rust-lang/miri/issues/2133 is finished, it'd be great if you could test them all again using the default flags; the hope is they should still all behave as you expect then.
Okay, once https://github.com/rust-lang/miri/issues/2133 lands, I'll re-test all the examples.
The current plan is to:
-
Fix all to-pointer casts that go through references to instead use
addr_of!andaddr_of_mut!. That is,&mut x as *mut Tbecomesaddr_of_mut!(x)&x as *const Tbecomesaddr_of!(x)
Doing it this way, specifically for all such casts and for
*mutand*constpointers, has numerous advantages:- It is correct in all cases because it does not create intermediate references.
- It more closely matches C semantics, which don't contain any of the semantics of Rust references, so we should avoid creating them.
- It is shorter than the
&borrow andascast, which is better for readability. - It has a more descriptive name, so it's clear what is happening compared to the
&borrow andascast. - It has higher precedence than an
ascast, so extra parentheses aren't needed, which hurts readability.
-
Then we fix the direct access to owned values (locals and statics) that have had their address taken. Instead, we immediately shadow the owned value with a
*mutpointer to it (gotten throughaddr_of_mut!). Then all reads and writes go through that*mutpointer instead of the original owned value.