rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

Add forbidden function casts RFC

Open WaffleLapkin opened this issue 8 months ago • 17 comments

This RFC proposes to forbid function -> integer as casts, since they have surprising aspects and are error prone.

Rendered

WaffleLapkin avatar Nov 12 '23 20:11 WaffleLapkin

what happens when code pointers are bigger than data pointers? do you have to fall back on transmute to get the address, since casting to a data pointer will lose information?

an example is the 8086 Medium Memory Model, which has 16-bit data pointers and 32-bit code pointers.

programmerjake avatar Nov 12 '23 21:11 programmerjake

usize is guaranteed to be the same size as a data pointer, so seemingly if cast to a data pointer would truncate, cast to usize would too. Does Rust support any such platform?

https://doc.rust-lang.org/reference/type-layout.html#pointers-and-references-layout

GoldsteinE avatar Nov 13 '23 05:11 GoldsteinE

usize is guaranteed to be the same size as a data pointer, so seemingly if cast to a data pointer would truncate, cast to usize would too. Does Rust support any such platform?

I couldn't find any specific instances, but I know AVR also can have different code/data pointer sizes, and MSP430 too, just all the configurations I could find don't do that.

With rustc's GCC backend, there are a whole pile of weird and wonderful cpu architectures that rust could feasibly support in the future.

I know others have discussed different code/data pointer sizes before, in reference to dlsym-style dynamic linking and how POSIX therefore requires code/data pointers to be the same size and how not everything supports that.

programmerjake avatar Nov 13 '23 07:11 programmerjake

Either way, if cast to *const () is a problem, a cast to usize also is (because they’re guaranteed to be the same size). A cast to a wider type could work, but “cast of function pointer to usize can truncate” is arguably an even bigger footgun.

Casting function pointer to an integer doesn’t seem to be a common operation either: you can’t do pointer arithmetics on it, so I think the only reason you would want this is some FFI / inline asm stuff? I think if you’re on an architecture where function pointers are larger than data pointers and you want their numeric values, you very much know what you’re doing and could write a transmute, and if you’re writing cross-platform code you would more often than not write as usize instead of as fnptrsize and get bit by truncation.

GoldsteinE avatar Nov 13 '23 07:11 GoldsteinE

Side note: having something like sptr::OpaqueFnPtr in the standard library would solve the problem, because its .addr() could return an appropriately-sized integer on these architectures, causing a compilation error instead of runtime truncation.

GoldsteinE avatar Nov 13 '23 07:11 GoldsteinE

I am ambivalent on this proposal as-is without something like addr on function items, as having to chain casts between a data pointer seems like we're making the actual usage worse, and may pose issues on other architectures. Having addr by contrast, is not only shorter, but allows makes it more clear what is being done.

carbotaniuman avatar Nov 15 '23 00:11 carbotaniuman

may pose issues on other architectures

Could you elaborate on this, please? As I've written before, if cast to a data pointer truncates, cast to usize also does, so this doesn't seem to cause any new issues.

GoldsteinE avatar Nov 15 '23 05:11 GoldsteinE

As I've written before, if cast to a data pointer truncates, cast to usize also does, so this doesn't seem to cause any new issues.

if a function pointer doesn't fit in usize but does fit in some larger integer type, (e.g. u32 when usize is 16-bit), then disallowing function pointer to integer casts removes the obvious way to get a function's address by as casting, since casting to a data pointer and then to u32 looses information since a data pointer can only hold 16-bits when usize is 16-bit.

// assume 32-bit code pointers and 16-bit data pointers:
pub fn works(fptr: fn()) -> u32 {
    fptr as u32
}

pub fn broken(fptr: fn()) -> u32 {
    fptr as *const () as u32
}

programmerjake avatar Nov 15 '23 06:11 programmerjake

Personally I would like to see us do something here. But I feel uneasy about recommending that function pointers be converted to data pointers, both for the reasons of size discrepancies that @programmerjake raised and because more generally, code and data could exist in different address spaces entirely.

tmandry avatar Nov 15 '23 19:11 tmandry

I think we should add core::ffi::uintfptr and core::ffi::intfptr types (like C's uintptr_t and intptr_t, except for function pointers) and add a addr method to all function pointer types. Then I'd be fine removing function-pointer to integer casts in a new edition.

e.g.:

impl<A, R, const ABI: &'static str> extern ABI fn(..A) -> R { // all function pointer types (needs more impls for unsafe fn, etc.
    pub fn addr(self) -> core::ffi::uintfptr {
        unsafe { mem::transmute(self) }
    }
}

programmerjake avatar Nov 15 '23 20:11 programmerjake

Side note: having something like sptr::OpaqueFnPtr in the standard library would solve the problem, because its .addr() could return an appropriately-sized integer on these architectures, causing a compilation error instead of runtime truncation.

I like to use fn(!) for opaque fn ptrs because it isn't callable without a transmute and is generally less footgun than just unsafe fn()

Sky9x avatar Nov 17 '23 19:11 Sky9x

wasm is a well-known architecture where function pointers are in a different address space from data pointers (they're table indices) despite being the same size.

riking avatar Nov 20 '23 20:11 riking

wasm is a well-known architecture where function pointers are in a different address space from data pointers (they're table indices) despite being the same size.

Is it relevant? Cast through a data pointer should still work I think?

GoldsteinE avatar Nov 20 '23 20:11 GoldsteinE

Either way, if cast to *const () is a problem, a cast to usize also is (because they’re guaranteed to be the same size)

CHERI is coming so I'm in favour of stopping conflating data pointers with function pointers with integers that can count a number of things in memory.

Making people use transmute seems reasonable, because it will complain if two things are not the same size, but do I like Fn::addr() -> core::ffi::uintfptr as well.

Also I spent nearly a decade programming the XAP2 processor (it's in almost every CSR Bluetooth chip) and it has entirely separate code and data address spaces. So, I'm sensitised to these kinds of issues because they bit me hard in C all the time.

jonathanpallant avatar Nov 21 '23 17:11 jonathanpallant

Making people use transmute seems reasonable

Note that this requires fnptr to integer to fnptr transmutes to be valid, which is not obvious, cf. https://github.com/rust-lang/unsafe-code-guidelines/issues/286, https://github.com/rust-lang/unsafe-code-guidelines/issues/340

CHERI is coming so I'm in favour of stopping conflating data pointers with function pointers with integers that can count a number of things in memory.

Unfortunately, I don’t think we can stop conflating uintptr_t with size_t with ptrdiff_t without breaking stability guarantees, which I think means that Rust on CHERI must have 128-bit array indices and generally everything pointer-related must be 128 bits.

GoldsteinE avatar Nov 21 '23 18:11 GoldsteinE

The concern about targets with data and function pointers being different is interesting. Note that I don't think difference in address spaces is significant — whatever the "blessed" way will be, we can just make sure we account for address spaces properly (going through *const () should not be a problem here, since we can just ptr::invalid).

For platforms with size_of::<fn()>() > size_of::<*const ()>() I have a few questions:

  1. Are there platforms like this that may be supported by Rust in the future (or are already supported)?
  2. Can we support this without making breaking changes?

Expanding on the second question, it seems like the assumption that size_of::<usize>() == size_of::<*const ()>() == size_of::<fn()>() is so ingrained in Rust, that we simply can't break it.

usize/ptr is guaranteed in usize docs (among with all the APIs) (emph. mine):

The pointer-sized unsigned integer type.

The size of this primitive is how many bytes it takes to reference any location in memory. For example, on a 32 bit target, this is 4 bytes and on a 64 bit target, this is 8 bytes.

Assumptions for fn() are not that public, but still we have a few examples in the docs

// (cut for brievety)
let bar_ptr: fn(i32) = not_bar_ptr; // force coercion to function pointer
assert_eq!(mem::size_of_val(&bar_ptr), mem::size_of::<usize>());
let fnptr: fn(i32) -> i32 = |x| x+2;
let fnptr_addr = fnptr as usize;
let fnptr = fnptr_addr as *const ();
let fnptr: fn(i32) -> i32 = unsafe { std::mem::transmute(fnptr) };
assert_eq!(fnptr(40), 42);

And also unstable APIs (FnPtr::addr) and also std code, for example fmt traits impls: https://github.com/rust-lang/rust/blob/3acb261e214cd13ae54346af30eae5807501ec37/library/core/src/ptr/mod.rs#L1989 (this goes FnPtr -> *const () -> usize).

I'm not sure we have a way out of all of this, given that unsafe code is relatively likely to depend on stuff like this (my opinion).

WaffleLapkin avatar Nov 26 '23 13:11 WaffleLapkin

@rustbot labels -A-edition-2024 +A-maybe-next-edition

We discussed the timeline of edition items in the lang meeting today, and this didn't make the list, so it seems unlikely this will happen for Rust 2024. We'll keep this in mind for future editions. Thanks to everyone working on this and providing helpful feedback here.

traviscross avatar Apr 10 '24 19:04 traviscross