c2rust icon indicating copy to clipboard operation
c2rust copied to clipboard

Transpiling a copy of an unspecified value introduces UB

Open purplesyringa opened this issue 9 months ago • 0 comments

To the best of my knowledge, the following code, if it compiles, does not have UB:

#include <stdint.h>

int main() {
    int32_t *p = malloc(sizeof(int32_t));
    *p;
}

Reasoning (I'm using https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf as the reference standard):

  • [7.22.3.4] malloc allocates an object whose value is indeterminate
  • [6.5] When using an object without a declared type for anything other than memcpy/memmove/typed copy, the effective type is the lvalue type
  • [3.19.2] An indeterminate value is either an unspecified value or a trap representation
  • [6.2.6.1] A trap representation is an object representation that doesn't represent a value of the object type
  • [6.2.6.2] An integer object representation is split into value bits and padding bits, where only the latter can affect the representation being trapping
  • [7.20.1.1] int32_t designates an signed integer type with width 32 and no padding bits
  • As a consequence, int32_t no trap representations, and *p is an unspecified value
  • [3.4.4] Unspecified behavior (not UB!) includes, among other things, the use of an unspecified value
  • [J.2] specifically mentiones that reading a trap representation from a non-char lvalue is UB, not an indeterminate value in general

c2rust transpiles this to:

#![allow(dead_code, mutable_transmutes, non_camel_case_types, non_snake_case, non_upper_case_globals, unused_assignments, unused_mut)]
extern "C" {
    fn malloc(_: libc::c_ulong) -> *mut libc::c_void;
}
pub type __int32_t = libc::c_int;
pub type int32_t = __int32_t;
unsafe fn main_0() -> libc::c_int {
    let mut p: *mut int32_t = malloc(::core::mem::size_of::<int32_t>() as libc::c_ulong)
        as *mut int32_t;
    *p;
    return 0;
}
pub fn main() {
    unsafe { ::std::process::exit(main_0() as i32) }
}

which does has UB, because in Rust, reading an uninitialized value is undefined behavior (Miri, but it's also kinda common sense).

A more narrow version is this problem is this memcpy implementation, which is perfectly legal in C:

#include <stddef.h>

void my_memcpy(unsigned char* dst, unsigned char* src, size_t n) {
    for (size_t i = 0; i < n; i++) {
        dst[i] = src[i];
    }
}

but not when transpiled with c2rust;

#![allow(dead_code, mutable_transmutes, non_camel_case_types, non_snake_case, non_upper_case_globals, unused_assignments, unused_mut)]
pub type size_t = libc::c_ulong;
#[no_mangle]
pub unsafe extern "C" fn my_memcpy(
    mut dst: *mut libc::c_uchar,
    mut src: *mut libc::c_uchar,
    mut n: size_t,
) {
    let mut i: size_t = 0 as libc::c_int as size_t;
    while i < n {
        *dst.offset(i as isize) = *src.offset(i as isize);
        i = i.wrapping_add(1);
        i;
    }
}

I've demonstrated the problem with int32_t first to show that this is a wide problem, not specific to character types.

I'm not sure how to solve this. Wrapping everything in MaybeUninit would work, but that's quite unwieldy.

purplesyringa avatar Mar 22 '25 02:03 purplesyringa