c2rust C without UB is translated to Rust with UB

struct Foo {
    int len;
};

void dec(Foo* f) {
    f->len--;
}

void bar() {
    Foo f = {5};
    struct Foo *fp = &f;
    dec(fp);
    f.len = 6;
    dec(fp);
}

translates to

#[derive(Copy, Clone)]
#[repr(C)]
pub struct Foo {
    pub len: libc::c_int,
}
#[no_mangle]
pub unsafe extern "C" fn dec(mut f: *mut Foo) { (*f).len -= 1; }
#[no_mangle]
pub unsafe extern "C" fn bar() {
    let mut f: Foo = { let mut init = Foo{len: 5 as libc::c_int,}; init };
    let mut fp: *mut Foo = &mut f;
    dec(fp);
    f.len = 6 as libc::c_int;
    dec(fp);
}

which when run under miri gives:

error: Undefined Behavior: no item granting read access to tag <untagged> found in borrow stack.
  --> src/main.rs:10:49
   |
10 | pub unsafe extern "C" fn dec(mut f: *mut Foo) { (*f).len -= 1; }
   |                                                 ^^^^^^^^^^^^^ no item granting read access to tag <untagged> found in borrow stack.
   |
   = help: this indicates a potential bug in the program: it performed an invalid operation, but the rules it violated are still experimental
   = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
           
   = note: inside `dec` at src/main.rs:10:49
note: inside `bar` at src/main.rs:17:5
  --> src/main.rs:17:5
   |
17 |     dec(fp);
   |     ^^^^^^^
note: inside `main` at src/main.rs:21:14
  --> src/main.rs:21:14
   |
21 |     unsafe { bar() }
   |              ^^^^^
   = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
   = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:137:18
   = note: inside closure at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:66:18
   = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
   = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:381:40
   = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:345:19
   = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:382:14
   = note: inside `std::rt::lang_start_internal` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:51:25
   = note: inside `std::rt::lang_start::<()>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:65:5

This happens because the mutable borrow of f is invalidated when f.len is mutated.

Sep 18 '20 02:09 jrmuizel

This happens because the mutable borrow of f is invalidated when f.len is mutated.

Sounds like the fix will be non-trivial, if possible at all, since we try to perfectly preserve the semantics and structure (somewhat) of the original C program. Re-borrowing every value after every possible mutation is going to require a lot of additional code and might introduce new bugs.

Sep 18 '20 02:09 ahomescu

Isn't the solution to just do let mut fp = &raw mut f; once raw references (or whatever they're called) are stabilized/nightly-available? This is a known issue with downcasting refs to ptrs in general

Sep 18 '20 03:09 TheDan64

Isn't the solution to just do let mut fp = &raw mut f; once raw references (or whatever they're called) are stabilized/nightly-available? This is a known issue with downcasting refs to ptrs in general

I thought so too but that doesn't seem to fix it: https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=20e0c25fd0bce187fb2ec03df11ff945

Sep 18 '20 03:09 jrmuizel

Personally, I'd bet MIRI just doesn't understand raw references yet. This is the exact problem they were intended to fix, as I understand it.

Sep 18 '20 03:09 TheDan64

Personally, I'd bet MIRI just doesn't understand raw references yet. This is the exact problem they were intended to fix, as I understand it.

@RalfJung what do you think?

Sep 18 '20 03:09 jrmuizel

This is the exact problem they were intended to fix, as I understand it.

The RFC gives other motivations: taking pointers to unaligned structure fields (which you can't do via an intermediate reference) and to fields in structures at invalid addresses (e.g. at 0x0 to implement offsetof). On the other hand, it might solve the problem in this issue because of this part:

Lowering to MIR should not insert an implicit reborrow of <place> in &raw mut <place>; that reborrow would assert validity and thus defeat the entire point. The borrow checker should do the usual checks on the place used in &raw, but can just ignore the result of this operation and the newly created "reference" can have any lifetime. When translating MIR to LLVM, nothing special has to happen as references and raw pointers have the same LLVM type anyway; the new operation behaves like Ref. When interpreting MIR in the Miri engine, the engine will know not to enforce any invariants on the raw pointer created by &raw.

Sep 18 '20 07:09 ahomescu

Miri does understand raw pointers. However, Rust has aliasing rules that C does not have, and those are becoming a problem here.

Specifically, when you have a local variable f and write f.len = ..., then (like if you had used a mutable reference) this asserts to the compiler that f is the unique way to access this memory, so any outstanding other pointers (safe references or unsafe raw pointers) now become invalid. This is crucial to enable the compiler to perform optimizations based on these aliasing rules.

Isn't the solution to just do let mut fp = &raw mut f; once raw references (or whatever they're called) are stabilized/nightly-available? This is a known issue with downcasting refs to ptrs in general

Yes that sounds like the most consistent approach to me -- do that and then never use f again. If everything goes through raw pointers, there are no alias assumptions.

I thought so too but that doesn't seem to fix it:

That is still mixing raw with non-raw accesses. Here's the "raw-only" version.

Sep 18 '20 13:09 RalfJung

std::ptr::addr_of_mut will be in Rust 1.51 which should allow fixing this.

Mar 22 '21 19:03 jrmuizel

Looks like std::ptr::addr_of and addr_of_mut got stabilized, so we should use them.

May 09 '22 21:05 ahomescu

Hi @RalfJung! We're noticing some unexpected behavior with miri with regards to addr_of_mut! and &mut x as *mut _ casts, and we wanted to make sure we're understanding things correctly and/or if there are any bugs in miri, so we have a few questions/examples to clarify.

We're using this as scaffolding code:

use std::ptr::addr_of_mut;

static mut X: i32 = 1;

unsafe fn dec(x: *mut i32) { 
    *x -= 1;
}

And all the examples below are wrapped in unsafe {}.

Why does &mut x invalidate mutable pointers to x but &mut x as *mut _ doesn't?

We noticed that this passes miri: playground

let mut x: i32 = 1;
let p = &mut x as *mut i32;
&mut x as *mut i32;
dec(p);

but not this: playground

let mut x: i32 = 1;
let p = &mut x as *mut i32;
let _a = &mut x;
dec(p);

In the &mut x as *mut _ case, why does creating the intermediary &mut x not invalidate mutable pointers to x? Or rather, in playground

let a = &mut x;         // Invalidates `p`
let _b = a as *mut i32; // Seemingly revalidates it

why does the &mut x invalidate p and then as *mut i32 revalidate it? We would've thought addr_of_mut! would be needed here to avoid creating the intermediary &mut x.

Why does addr_of_mut! not work in some cases where &mut x as *mut _ does?

We noticed that while addr_of_mut! works fine for local variables, like this: playground

let mut x: i32 = 1;
let p = &mut x as *mut i32;
*addr_of_mut!(x) = 2;
dec(p);

it doesn't work for static variables, like this: playground

let p = &mut X as *mut i32;
*addr_of_mut!(X) = 2;
dec(p);

Also, for static variables, it works as long as the addr_of_mut! does not come after a &mut x as *mut _. That is, playground

let p = &mut X as *mut i32;
*addr_of_mut!(X) = 2;
dec(p);

doesn't work, but these 3 other orderings do: playground

let p = addr_of_mut!(X);
*addr_of_mut!(X) = 2;
dec(p);

playground

let p = addr_of_mut!(X);
*(&mut X as *mut i32) = 2;
dec(p);

playground

let p = &mut X as *mut i32;
*(&mut X as *mut i32) = 2;
dec(p);

Is this the correct behavior, and if it is, could you explain what exactly is happening? Or is &mut x as *mut _ working in more cases than it should? Or is addr_of_mut! working in fewer cases than it should? And why are static variables behaving differently from local variables?

Thanks for all the help!

Jun 03 '22 16:06 kkysen

Many of your questions, I think, are related to the fact that Miri, by default, treats raw pointers as "untagged". This is necessary to accept a bunch of real-world code out there that does nasty things with integer-pointer casts, but can lead to very surprising behavior. We are actively working on fixing this, but we are not there yet.

So, what happens when you run all of your examples with -Zmiri-tag-raw-pointers? Unfortunately, this is not possible on the playground, so it is more annoying to explore. Your first example then fails, as you (probably?) expected:

let mut x: i32 = 1;
let p = &mut x as *mut i32;
&mut x as *mut i32;
dec(p);

shows

error: Undefined Behavior: attempting a read access using <3072> at alloc1653[0x0], but that tag does not exist in the borrow stack for this location
  --> example.rs:6:5
   |
6  |     *x -= 1;
   |     ^^^^^^^
   |     |
   |     attempting a read access using <3072> at alloc1653[0x0], but that tag does not exist in the borrow stack for this location
   |     this error occurs as part of an access at alloc1653[0x0..0x4]
   |
   = help: this indicates a potential bug in the program: it performed an invalid operation, but the rules it violated are still experimental
   = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <3072> was created by a retag at offsets [0x0..0x4]
  --> example.rs:12:9
   |
12 | let p = &mut x as *mut i32;
   |         ^^^^^^
help: <3072> was later invalidated at offsets [0x0..0x4]
  --> example.rs:13:1
   |
13 | &mut x as *mut i32;
   | ^^^^^^
   = note: inside `dec` at example.rs:6:5

If there is still surprising behavior that you see with this flag, please let me know. :)

Jun 03 '22 18:06 RalfJung

Thanks so much for the help! When I run all the examples with -Zmiri-tag-raw-pointers, the only one that still passes is the one that uses addr_of_mut! everywhere:

playground

let p = addr_of_mut!(X);
*addr_of_mut!(X) = 2;
dec(p);

which is what I was expecting.

Jun 07 '22 16:06 kkysen

That's great to hear. :) Once https://github.com/rust-lang/miri/issues/2133 is finished, it'd be great if you could test them all again using the default flags; the hope is they should still all behave as you expect then.

Jun 07 '22 17:06 RalfJung

Okay, once https://github.com/rust-lang/miri/issues/2133 lands, I'll re-test all the examples.

Jun 07 '22 19:06 kkysen

The current plan is to:

Fix all to-pointer casts that go through references to instead use addr_of! and addr_of_mut!. That is,
- &mut x as *mut T becomes addr_of_mut!(x)
- &x as *const T becomes addr_of!(x)
Doing it this way, specifically for all such casts and for *mut and *const pointers, has numerous advantages:
- It is correct in all cases because it does not create intermediate references.
- It more closely matches C semantics, which don't contain any of the semantics of Rust references, so we should avoid creating them.
- It is shorter than the & borrow and as cast, which is better for readability.
- It has a more descriptive name, so it's clear what is happening compared to the & borrow and as cast.
- It has higher precedence than an as cast, so extra parentheses aren't needed, which hurts readability.
Then we fix the direct access to owned values (locals and statics) that have had their address taken. Instead, we immediately shadow the owned value with a *mut pointer to it (gotten through addr_of_mut!). Then all reads and writes go through that *mut pointer instead of the original owned value.

Jun 07 '22 19:06 kkysen