`c2rust` introduces UB in `&a[i]` by adding a `*` deref and `&` ref when there should only be an `.offset`
Another issue I found when running miri on the generated code.
#include <stdlib.h>
struct Foo {
int *data_ptr;
int data[];
};
void f() {
struct Foo *foo = malloc(sizeof(struct Foo));
foo->data_ptr = &foo->data[0];
}
int main() {
f();
}
translates to
extern "C" {
#[no_mangle]
fn malloc(_: libc::c_ulong) -> *mut libc::c_void;
}
#[derive(Copy, Clone)]
#[repr(C)]
pub struct Foo {
pub data_ptr: *mut libc::c_int,
pub data: [libc::c_int; 0],
}
#[no_mangle]
pub unsafe extern "C" fn f() {
let mut foo: *mut Foo =
malloc(::std::mem::size_of::<Foo>() as libc::c_ulong) as *mut Foo;
(*foo).data_ptr =
&mut *(*foo).data.as_mut_ptr().offset(0 as libc::c_int as isize) as
*mut libc::c_int;
}
unsafe fn main_0() -> libc::c_int { f(); return 0; }
#[main]
pub fn main() { unsafe { ::std::process::exit(main_0() as i32) } }
error: Undefined Behavior: memory access failed: pointer must be in-bounds at offset 12, but is outside bounds of alloc2017 which has size 8
--> src/main.rs:20:9
|
20 | &mut *(*foo).data.as_mut_ptr().offset(0 as libc::c_int as isize) as
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ memory access failed: pointer must be in-bounds at offset 12, but is outside bounds of alloc2017 which has size 8
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
The &mut *() dereferences the data member which allowed in C but not in Rust. Removing/folding the &mut * fixes the problem.
The variable-length data field is also mistranslated. It should be [libc::c_int] in Rust, not a zero-sized array, [libc::c_int; 0]. Moreover, we can't derive Copy and Clone for dynamically sized types in Rust, and we can't use size_of to get the size of the struct since it's dynamically sized. In C, sizeof assumes the dynamically sized fields are zero-sized, so we should use that behavior.
This bug is due to c2rust transpile mistranslating &a[i]. This should be purely an a.offset(i) without any * derefs (which cause UB on the out-of-bounds deref) or &/&mut refs. That is, c2rust transpile should know the [] and & cancel out, as that's how they're treated in C.
This is a little more complicated than that. Two major C compilers (Clang & MSVC) consider this undefined behaviour as &a[i] is equivalent to &*(a + i) and therefore dereferences before it gets the address. Only GCC considers it equivalent to (a + i). The problem is neither compiler catches this, especially by default. The only tool I've seen that catches it for malloced memory is the MVSC static analyzer: https://godbolt.org/z/fEo6877Es .
Having said that, Clang will give a warning for the same problem if the array in question has known bounds. It'll even trip the undefined behavior sanitizer: https://godbolt.org/z/bv9dGK4vx .
C++ standards' wording on this is more clear, and would make this definitely undefined behavior. The real solution is to use (a + i) to get an address to one past the end of an array in C. It's more portable.