rfcs RFC: Trait for `!Sized` thin pointers

Nov 29 '23 04:11 jmillikin

I think we should add:

unsafe impl<T: Sized> DynSized for T { // : Sized just to be explicit
    fn size_of_val(&self) -> usize {
        core::mem::size_of::<T>()
    }
}

Nov 29 '23 06:11 programmerjake

What should we do with std::mem::size_of_val_raw() (currently unstable, so we can change it)? We cannot use DynSized from its signature, since it takes a reference. And we cannot change it to take a raw pointer, since some types (flexible array members, or CStr) depends on accessing the contents to calculate the size. Should we have three type groups, "Sized", "?Sized but size can be calculated without accessing memory (i.e. can be calculated via a raw pointer)" and "?Sized and needs to access the memory to calculate the size"?

Another thing to consider is alignment. This RFC touches only the size aspect, but some DSTs need custom alignment too.

Nov 29 '23 07:11 ChayimFriedman2

What should we do with std::mem::size_of_val_raw() (currently unstable, so we can change it)? We cannot use DynSized from its signature, since it takes a reference. And we cannot change it to take a raw pointer, since some types (flexible array members, or CStr) depends on accessing the contents to calculate the size. Should we have three type groups, "Sized", "?Sized but size can be calculated without accessing memory (i.e. can be calculated via a raw pointer)" and "?Sized and needs to access the memory to calculate the size"?

I think it depends on what it's supposed to do. IMO the two good options are:

Have it return Option<usize>, and specify the flavors of pointer for which it will/won't return a value.
- If you want size_of_val_raw::<T: ?Sized>(NonNull::dangling().as_ptr()) to be allowed, then the only possible return value for !Sized thin pointers is None.
Separate the thin and fat pointers into separate functions, and put a <T: Thin + Sized> bound on the thin one.
- All the existing caveats around exotic fat pointers still apply for the fat-pointer version, but at least this RFC won't make them worse.

The problem with that function as currently specified is it assumes a pointer to a value contains enough information to determine the layout of that value, even if the pointer is dangling into nowhere. That's generally not something that most low-level languages guarantee because it's expensive to have every pointer to a dynamically-sized value haul around its own size.

Another thing to consider is alignment. This RFC touches only the size aspect, but some DSTs need custom alignment too.

!Sized thin pointers to types with static alignment can use existing mechanisms, such as #[repr(align(N))]. They're always the last field of a struct, so all the existing alignment rules should apply as-is.

DSTs with dynamic/unknown alignment are out of scope for this RFC, and are probably better suited for the extern types feature given how many other parts of the language they touch.

Nov 29 '23 09:11 jmillikin

I'd like to name drop #3396 which is steadily turning from an extern types v2 RFC to a MetaSized trait RFC, so it addresses very similar issues to this proposal.

My main complaint with that RFC (and extern types in general) is that it conflates "dynamic size" and "dynamic alignment". A dynamically-sized value may have a static alignment, which is what allows it to be used in struct fields. A type with dynamic (or unknown) alignment is much less capable, being basically an opaque handle.

The "extern types" idea is also strongly focused on naming types defined elsewhere, but my use case is much simpler -- I want Rust's existing !Sized semantics without the fat-pointer overhead. It's an optimization, like allowing Option<&i32> to fit into a single register.

Dec 01 '23 04:12 jmillikin

I'm going to have limited Git access for a while, but wanted to write down some quick thoughts based on how this might fit into the DynSized / extern type cinematic universe.

First, the fundamental semantics of !Sized thin pointers is very similar to a union with lots of array members:

struct Request {
    len: u32,
    data: [u8],
}
impl DynSized for Request { ... }

// IS EQUIVALENT TO

struct Request {
    len: u32,
    data: RequestData,
}
union RequestData {
    len_n0: [u8; 0],
    len_n1: [u8; 1],
    len_n2: [u8; 2],
    // ...
    len_max: [u8; u32::MAX],
}

This implies that the data values should be gated behind unsafe of some sort -- all members of the data array are potentially uninitialized and/or beyond the bounds of the Request allocated object. The only safe operation on RequestData is to get a *u8 by interpreting it as a 0-sized array -- the pointer is well-aligned but potentially dangling.

impl RequestData {
    fn as_ptr(&self) -> *const u8 {
        // SAFETY: 0 is always a valid array length
        (unsafe { self.len_n0 }).as_ptr()
    }
    fn as_mut_ptr(&mut self) -> *mut u8 {
        // SAFETY: 0 is always a valid array length
        (unsafe { self.len_n0 }).as_mut_ptr()
    }
}

If the RequestData type is made generic, then it starts to look like a lang item with special semantics.

The DynSized trait doesn't need to be the trigger for thin-pointer optimization, the presence of this magic marker type in the struct fields could do it (similar to today's DST-via-array-field):

struct Request {
    len: u32,
    // the presence of this type, which must be the final field, makes `Request` into
    // a `!Sized` type with thin pointers.
    data: core::marker::UnsizedArray<u8>,
}
impl core::marker::UnsizedArray<T> {
    fn as_ptr(&self) -> *const T {
        // SAFETY: 0 is always a valid array length
        (unsafe { self.len_n0 }).as_ptr()
    }
    fn as_mut_ptr(&mut self) -> *mut T {
        // SAFETY: 0 is always a valid array length
        (unsafe { self.len_n0 }).as_mut_ptr()
    }
}

It should not be possible to call size_of_val() on an UnsizedArray, because it doesn't have a size (not even a dynamic one). This implies core::mem::size_of_val() needs a new implicit trait bound -- implied by even <T: ?Sized>. Which also means Mutex<T: ?Sized>, Box<T: ?Sized>, etc can't directly store an UnsizedArray or any type containing one.

Let's call it ?FixedLayout for now, because it's implemented by all types that have values with a fixed allocated object layout -- in other words, their values' size and alignment can't change. Every type that currently exists in stable Rust implements FixedLayout (extern types don't, but they're unstable).

// core::marker ?
unsafe trait FixedLayout {
    fn align_of_val_raw(ptr: *const Self) -> usize;
    fn size_of_val_raw(ptr: *const Self) -> usize;
}

unsafe impl<T> FixedLayout for T { ... }     // Sized types have fixed layouts
unsafe impl<T> FixedLayout for [T] { ... }  // so do slices
unsafe impl FixedLayout for str { ... }  // and strings
unsafe impl FixedLayout for dyn * { ... }  // and trait vtables

// compiler magic:
// all types that are fat-pointer DSTs due to a DST field have fixed layout,
// for example `struct Foo([u8])` gets an automatic `FixedLayout` impl

// UnsizedArray does NOT impl FixedLayout

This leaves us to think about core::mem::align_of_val() and core::mem::size_of_val(), which have <T: ?Sized> bounds and therefore can't accept !Sized thin-pointer types unless their signature changes.

Option 1 is to give them a ?FixedLayout bound and do some more compiler magic to wire up a trait DynSized, but this seems like a good opportunity to draw a line in the sand about which type categories are guaranteed to be usable with those functions. After all, extern types are also !FixedLayout, and it'd be nice if we could make size_of_val(&some_extern_value) not compile.

So let's go with option 2, a trait for types that have a dynamic layout. No compiler magic, it's just library code for people who want to write a T: ?Sized + ?FixedLayout + DynLayout function deep in the guts of wasm-bindgen or whatever:

// trait bounds imply `T: FixedLayout`, therefore forbid UnsizedArray
// or any type containing one. Also forbids extern types.
fn align_of_val<T: ?Sized>(val: &T) -> usize;
fn size_of_val<T: ?Sized>(val: &T) -> usize;

// Current text of the RFC calls this `DynSized`.
//
// Types with dynamic size (or dynamic alignment, god help them) can impl this,
// but no special compiler magic happens.
pub trait DynLayout {
    fn align_of_val(&self) -> usize;
    fn size_of_val(&self) -> usize;
}

Okay it's about midnight so thanks for coming to my ted talk I guess. Hopefully some of this is coherent.

Dec 01 '23 14:12 jmillikin

Is this meant to somehow interact with MetaSized from extern types v2?

Dec 02 '23 04:12 tgross35

Is this meant to somehow interact with MetaSized from extern types v2?

This is intended to be a subset of extern types and custom DST metadata. Both of those proposals handle !Sized thin pointers as a special case of a much larger change, which is impacting their timelines -- for example extern type needs to support types without a statically-known alignment, which is difficult.

My hope is that !Sized thin pointers is well-scoped enough that it could be implemented and stabilized on its own, without blocking long-term progress toward fully opaque extern types (for C/WASM) or custom pointer metadata (for the fancy dyn vtable stuff).

Dec 02 '23 04:12 jmillikin

Recording another idea to investigate if the above design doesn't pan out:

If it's important that the size of a value always be knowable just by having a reference to that value, then another option is to do just-in-time conversions from a thin&() to a fat &T. Placing the dynamic length query close to the reference construction should hopefully let the optimizer eliminate it in cases where it's unnecessary to the current function.

unsafe trait DynSized {
    unsafe fn size_of_val(data_address: core::ptr::NonNull<()>) -> usize;
}

struct ThinRef<'a, T: ?Sized> {
    data_address: &'a (),
    _t: core::marker::PhantomData<&'a T>,
}

The user would write code in terms of references, and the compiler would convert between &T and ThinRef<'a, T> at function call boundaries:

#[repr(C, align(8), thin_dst]
struct Request {
    len: u32,
    data: [u8],
}

unsafe impl DynSized for Request {
    unsafe fn size_of_val(data_address: core::ptr::NonNull<()>) -> usize {
        data_address.as_ptr().cast::<u32>().read() as usize
    }
}

fn intrinsics::dyn_sized_thin_ref_to_ref<'a, T: ?Sized + DynSized>(ThinRef<'a, T>) -> &'a T;
fn intrinsics::dyn_sized_ref_to_thin_ref<'a, T: ?Sized + DynSized>(&'a T) -> ThinRef<'a, T>;

// user writes this
fn handle_request(r: &Request) {
    if r.is_empty() { return; }
    do_something_with_request(r);
}

// converted to this
fn handle_request<'1>(r: ThinRef<'1, Request>) {
    let r: &'1 Request = intrinsics::dyn_sized_thin_ref_to_ref(r);
    if r.is_empty() { return; }
    do_something_with_request(intrinsics::dyn_sized_ref_to_thin_ref(r));
}

A somewhat rough approximation in Playground indicates that this approach produces the expected behavior, i.e. only one register is used for passing around a ThinRef<Request>, and functions that receive it only inspect the length when the dynamically-sized portion of a &Request is inspected.

Dec 02 '23 13:12 jmillikin

Plausibly purely for my own reference, in the terminology of the extern types v2 PR:
FixedLayout == metadata sized + metadata aligned (the same as the MetaSized trait from extern types v2) DynLayout == dynamically sized (+ at least metadata aligned, this proposal is specifically avoiding adding anything less in the aligned category)
It's also proposing ?Sized + ?FixedLayout + ?DynLayout as the minimum trait bound meaning all types are at least dynamically sized + metadata aligned.

The extra trait has the effect of allowing people to write dynamically sized types, but the higher minimum bound means you can't write entirely opaque types, like extern types. This isn't quite enough to accurately represent CStr though (can you do align_of::<CStr>(), if you can what are its bounds?). Note that does mean that proposal hits this issue:

pub trait Trait {
    fn foo(self: Box<Self>) where Self: FixedLayout;
    fn bar(self: Arc<Self>) where Self: FixedLayout;
}

You have to have the where clause because it's no longer true that all types satisfy the bounds on Box or Arc (T: ?Sized + ?FixedLayout) which would be a breaking change without a separate mitigation.

Dec 02 '23 21:12 Skepfyr

Linking this because I think it is closely related and shows a bunch of special cases that could be relevant https://github.com/rust-lang/unsafe-code-guidelines/issues/256

Dec 08 '23 04:12 AaronKutch

There's a subtle but severe danger to requiring a read to determine size: shared mutability. I can use size_of_val on &Mutex<CStr>. I can do so while the mutex is locked and I'm holding &mut CStr. If size_of_val needs to read the value, now I have a read of my value aliasing my writes through &mut, i.e. this is unsound at a minimum, and likely quick UB.

So either every type that exposes shared mutability of a generic ?Sized type needs to forbid DynSized types, or shared mutability needs to surpress the "thin pointer optimization." And, oops, UnsafeCell isn't the only shared mutability primitive, *mut T and *const T are shared mutability points as well, so the very thing you're trying to make thin mustn't elide the pointer metadata lest it be used to encapsulate some shared mutability.

At least you thought to make it require unsafe impl

[That] a pointer to a value contains enough information to determine the layout of that value [is] generally not something that most low-level languages guarantee because it's expensive to have every pointer to a dynamically-sized value haul around its own size.

On a PDP11, yeah, passing around single pointers to sentinel terminated lists could often be cheaper than also passing around the length, if you rarely needed the length except to iterate through it. It's also how simple string manipulation can end up being accidentally O(n²). On any somewhat modern machine, registers and stack aren't free, but they're quite cheap, and even heap memory is fairly cheap when in cache. Additionally, while you're "spending" a word more per pointer, you're also "saving" a word per object, so if you're caring this much about memory usage optimization and do some sort of pointer compression scheme (i.e. a u32 indexed arena of pointers), you might even manage to save memory pressure compared to inline lengths.

(C++ doesn't not use fat pointers because of some performance reason, it has simple pointers only because of C compatibility. For a long time, the "correct" and idiomatic way to pass strings around in C++ was cosnt std::string& (i.e. &String), and it took a long time to get std::string_span (i.e. &str).

Rust also uses the length significantly more often than C or C++ would, because array indexing is checked. The optimizer does a pretty decent job at minimizing the cost of that specifically because the slice length is already available on the stack. If the length is on the stack, it's an additional logical memory access requirement for each indexing operation, and the optimizer has a much more interesting time proving that it's guaranteed already proven this check true, so using thin pointers can result in more index checking, more panicking arms, and worse code optimization.

Additionally, you don't even need language support or to lie with &Header references in order to implement thin pointers as a library item (disclaimer: my crate[^2]). All you need to do is use Thin<Ptr<Dst>> where Thin stores ptr::NonNull<()> instead of Ptr<Dst>. Now it's a case of choosing your abstractions, just like every other pointer kind in Rust.

[^2]: It's due for a bit of a makeover, since I understand the problem space a lot better now than I did when I originally wrote the crate, and am just generally better at designing food Rust APIs, but it's been languishing hoping for feature(ptr_metadata) to become available, and I haven't been writing the kind of code that would use it. But I'm about to be again, so it's likely I'll be fixing up erasable and slice-dst to a more modern implementation standard.

Dec 17 '23 04:12 CAD97

There's a subtle but severe danger to requiring a read to determine size: shared mutability. I can use size_of_val on &Mutex<CStr>. I can do so while the mutex is locked and I'm holding &mut CStr. If size_of_val needs to read the value, now I have a read of my value aliasing my writes through &mut, i.e. this is unsound at a minimum, and likely quick UB.

@CAD97 Could you elaborate on how this might be unsound?

Specifically, how would size_of_val work on &Mutex<T> for T: !Sized? Only reasonable way I can think of to implement that is size_of_val(&*mutex.lock()), which is safe as it just panics if there's an existing lock.

Dec 23 '23 09:12 dead-claudia

Any progress on this item? This would greatly convince existing C/C++ low latency apps to consider Rust.

Apr 01 '24 15:04 tojocky

~~I'd like to raise a bit of a concern here. It interacts way too much with the pointer metadata RFC, and should really be merged into it IMHO. https://github.com/rust-lang/rust/issues/123353~~

~~This RFC proposal fails to encapsulate alignment (which is dynamic for dyn Trait and undefined for extern types), and there's some type safety concerns around sizing.~~

Edit: ignore all of that. Filed that issue very prematurely.

Apr 02 '24 05:04 dead-claudia

Coming back with a fresh mind:

DSTs with dynamic/unknown alignment are out of scope for this RFC, and are probably better suited for the extern types feature given how many other parts of the language they touch.

@jmillikin This feature is pretty useless without alignment info. You can't generically box a value without knowing its runtime alignment, for one.

However, alignment is just a field access for dyn Trait and it's statically calculable for everything else except extern types (where the alignment could be externally determined).

Apr 02 '24 12:04 dead-claudia

@CAD97 (re: https://github.com/rust-lang/rust/issues/123353) As mentioned before, mutexes can simply return mutex.lock().unwrap_or_else(|e| e.into_inner()).size_of_val(). I see no good reason why they can't "just" lock. UnsafeCell<T> is a good callout, though, and I've folded it into the above review accordingly.

Apr 02 '24 12:04 dead-claudia

I believe this is covered elsewhere, but in the current proposal DynSized is a trait for types that do not have a statically-known size. This implies:

A statically-sized type is not DynSized.
A dynamically-sized type still has a static alignment. It's not a total free-for-all like external types.
I do not believe there is any connection to custom pointer metadata.

Regarding the overall proposal, I've been trying to figure out how to get the benefits of unsized thin pointers without breaking MIRI or other semantics based on the assumption of a &T having a defined size for all types. At present I'm starting to come around to the idea of matching C's semantics, where size_of for unsized thin pointers would return a size of the statically-known portion of the type (excluding the DST part).

In any case the goal is to iterate toward a simpler design that solves the concrete problem of DST pointers being too big. I'm not interested in the enormous work implied by custom metadata or fully-opaque external types. Also, introducing unsafe locking semantics into Mutex is not a viable approach.

Apr 02 '24 12:04 jmillikin

I believe this is covered elsewhere, but in the current proposal DynSized is a trait for types that do not have a statically-known size. This implies:

A statically-sized type is not DynSized.

A dynamically-sized type still has a static alignment. It's not a total free-for-all like external types.

I do not believe there is any connection to custom pointer metadata.

Note the section I posted that comment in. 😉

It's intended to be a possible alternative model, where DynSized would instead represent the set of types that has a size known at runtime. In this alternative model, it generalizes Sized, where the size is known both statically and at runtime.

Regarding the overall proposal, I've been trying to figure out how to get the benefits of unsized thin pointers without breaking MIRI or other semantics based on the assumption of a &T having a defined size for all types.

Unfortunately, extern types will break that anyways. They're already in nightly (albeit in a somewhat broken state), so you can't really avoid it.

At present I'm starting to come around to the idea of matching C's semantics, where size_of for unsized thin pointers would return a size of the statically-known portion of the type (excluding the DST part).

This would be helpful, but should be its own separate function IMHO. Maybe something like std::mem::size_of_static::<T>(). Both are useful, just in different contexts.

In any case the goal is to iterate toward a simpler design that solves the concrete problem of DST pointers being too big. I'm not interested in the enormous work implied by custom metadata or fully-opaque external types.

See above regarding extern types.

Custom metadata can provide a way to compute size at runtime, but all you'd be doing is consuming it in a few specific places with my alternative. If you rely on external types, you can focus on only the cases where there isn't runtime size/alignment information, and so your proposal (and work) ends up a lot smaller.

You could even define that alternative trait and the not-auto-generated bits today:

pub unsafe trait DynSized {
    fn size_of_val(&self) -> usize;
}

// Your proposal wouldn't have this, but my alternative would
unsafe impl<T: Sized> DynSized for T {
    fn size_of_val(&self) -> usize {
        std::mem::size_of::<T>()
    }
}

// For `dyn Trait`
unsafe impl<T: std::ptr::Pointee<Metadata=std::ptr::DynMetadata<U>>, U: ?Sized> DynSized for T {
    fn size_of_val(&self) -> usize {
        std::ptr::metadata(self).size()
    }
}

unsafe impl<T> DynSized for [T] {
    fn size_of_val(&self) -> usize {
        self.len() * std::mem::size_of::<T>()
    }
}

unsafe impl DynSized for str {
    fn size_of_val(&self) -> usize {
        self.len()
    }
}

Also, introducing unsafe locking semantics into Mutex is not a viable approach.

The trait itself needs to be unsafe in both of our versions (it implies memory validity from ptr to ptr.add((*ptr).size_of_val())) regardless, but I don't see how else memory safety should be of concern here? Trying to figure out what you mean by "unsafe" here.

Apr 02 '24 13:04 dead-claudia

So if I understand correctly, the motivation is that a fat pointer is unnecessary if the size is already encoded within the object, so if we define a way of extracting that encoded size then we need only have a thin pointer. I guess that's a fairly common FFI situation, but it still seems a bit narrow.

First thing that comes to mind is that it only applies when the size is encoded in a way that's reachable from &self. In a lot of cases the size might be elsewhere (eg a separate pointer/size list or similar) so this wouldn't work for those cases.

Secondly, DynSized is an unsafe trait. The RFC doesn't really go into what specific safety properties the implementer is required to uphold. For example, is it returning the actual allocation size of the object (capacity, for allocation) or the valid size of the object (length, for bounds checks). Are there limits of what the implementation of size_of_val is allowed to do? Certainly in the common case the expectation is that it's extracting a size from a field or scanning for a \0. But is it allowed to do anything? IO? Take locks? Start threads? Access static or thread-local state? Can it be fallible? Is it required to be a deterministic?

But I think the big problems are those that arise from having to access the representation in order to get the size: when it's valid to access the representation, and whether the returned size is constant? With a standard DST, when you form the fat pointer, you're effectively making a commitment to the size, and that size is stored independently from the representation of the object itself, so neither problem arises.

It seems to me that you could solve this in a bespoke way using the ptr_metadata API - you define your own thin pointer types for your internal use (eg if pointer density is important), and have a method that can use ptr::from_raw_parts to materialize a fat pointer when needed to interact with the rest of the Rust world. It's possibly a little more burdensome than having it magically happen with transparent invocations of the DynSized trait, but it doesn't seem overwhelming.

I just landed changes to bindgen to enable the use of DST to represent Flexible Array Members (https://github.com/rust-lang/rust-bindgen/pull/2772) which generates an API along similar lines for the types it generates (though mostly to convert between the "fixed" and "dynamic" variants of each structure).

Apr 02 '24 23:04 jsgf

So if I understand correctly, the motivation is that a fat pointer is unnecessary if the size is already encoded within the object, so if we define a way of extracting that encoded size then we need only have a thin pointer. I guess that's a fairly common FFI situation, but it still seems a bit narrow.

That's a correct summary, but it's not about FFI. For FFI that wants to pass around handles to an externally-allocated resource, *mut () works fine.

Doubling the size of pointers means that function parameters spill to the stack much more frequently, so using thin pointers where possible can provide a significant performance uplift. I've measured something like 10-20% improvement in some of my own codebases from using wrapper types to emulate thin pointers. There's a lot of performance being left on the table compared to C/C++ for zero-copy use cases like packet processing.

First thing that comes to mind is that it only applies when the size is encoded in a way that's reachable from &self. In a lot of cases the size might be elsewhere (eg a separate pointer/size list or similar) so this wouldn't work for those cases.

Yep, that's true. If a value's size can't be determined from the value itself, then unsized thin pointers aren't practical.

Secondly, DynSized is an unsafe trait. The RFC doesn't really go into what specific safety properties the implementer is required to uphold [...]

I think it's a bit early in the design process to discuss such minutae, when there isn't even consensus on unsized thin pointers being desirable at all.

It seems to me that you could solve this in a bespoke way using the ptr_metadata API - you define your own thin pointer types for your internal use (eg if pointer density is important), and have a method that can use ptr::from_raw_parts to materialize a fat pointer when needed to interact with the rest of the Rust world. It's possibly a little more burdensome than having it magically happen with transparent invocations of the DynSized trait, but it doesn't seem overwhelming.

That's already possible in today's Rust, using struct Thin<'a, T> { inner: &'a () } and similar idioms. The problem is that such wrappers pollute the public API, so instead of struct SomeStructure you also end up with SomeStructurePtr, SomeStructureRefMut, etc. It's quite unpleasant to work with that kind of API, and definitely can't be published for general use.

I just landed changes to bindgen to enable the use of DST to represent Flexible Array Members (rust-lang/rust-bindgen#2772) which generates an API along similar lines for the types it generates (though mostly to convert between the "fixed" and "dynamic" variants of each structure).

Pulling in bindgen and all the rest of its FFI semantics just to get unsized thin pointers doesn't seem great, and also it doesn't work in normal Rust due to the use of unstable nightly features.

Apr 03 '24 00:04 jmillikin

This feature is pretty useless without alignment info. You can't generically box a value without knowing its runtime alignment, for one.

actually, I have a proposal that would allow Box<MyTypeWithUnknownAlign, MyTypeDropper>: https://github.com/rust-lang/rfcs/pull/3470#issuecomment-1674249638 with sample usage https://github.com/rust-lang/rfcs/pull/3470#issuecomment-1674265515

Apr 03 '24 01:04 programmerjake

This feature is pretty useless without alignment info. You can't generically box a value without knowing its runtime alignment, for one.

actually, I have a proposal that would allow Box<MyTypeWithUnknownAlign, MyTypeDropper>: https://github.com/rust-lang/rfcs/pull/3470#issuecomment-1674249638 with sample usage https://github.com/rust-lang/rfcs/pull/3470#issuecomment-1674265515

@programmerjake Could you explain how specifically a type would end up having an unknown alignment in that? I can't figure out how one would construct such a type, even for the sake of that proposed ABI.

Apr 03 '24 04:04 dead-claudia

@programmerjake Could you explain how specifically a type would ebd up having an unknown alignment in that? I can't figure out how one would construct such a type, even for the sake of that proposed ABI.

it would be constructed from FFI, basically transmuting a pointer to a C struct with no body to Box<MyType, MyTypeDropper>:

extern "C" {
    pub type MyType;
    fn make_my_type() -> Option<Pin<Box<MyType, MyTypeDropper>>>;
    fn destroy_my_type(p: Option<Pin<Box<MyType, MyTypeDropper>>>);
}

impl MyType {
    pub fn new() -> Pin<Box<MyType, MyTypeDropper>> {
        unsafe { make_my_type() }.unwrap_or_else(|| panic!("out of memory"))
    }
}

pub struct MyTypeDropper;

impl BoxDrop<MyType> for MyTypeDropper {
    fn box_drop(v: Pin<Box<MyType, MyTypeDropper>>) {
        unsafe { destroy_my_type(Some(v)); }
    }
}

// in C header
struct MyType;
struct MyType *make_my_type(void);
void destroy_my_type(struct MyType *);

// in C source
struct MyType {
    // some example fields
    int field;
    char *field2;
    int *field3;
};

struct MyType *make_my_type(void) {
    struct MyType *retval;
    retval = (struct MyType *)calloc(sizeof(MyType));
    if(!retval)
        return NULL;
    retval->field = 3;
    retval->field2 = strdup("a string");
    if(!retval->field2)
        goto fail;
    retval->field3 = (int *)malloc(sizeof(int));
    if(!retval->field3)
        goto fail;
    *retval->field3 = 42;
    return retval;
fail:
    free(retval->field2);
    free(retval->field3);
    free(retval);
    return NULL;
}

void destroy_my_type(struct MyType *p) {
    if(!p)
        return;
    free(p->field2);
    free(p->field3);
    free(p);
}

Apr 03 '24 05:04 programmerjake

For most FFI use cases, you don't need unsized thin pointers. You're not trying to declare the type in Rust or interact directly with dynamically-sized fields. All you need is a wrapper.

mod c {
  #[repr(transparent)]
  pub(super) struct MyType(core::ffi::c_void);

  extern "C" {
    pub(super) fn make_my_type() -> *mut MyType;
    pub(super) fn destroy_my_type(p: *mut MyType);
  }
}

pub struct MyType {
  raw: *mut c::MyType,
}

impl Drop for MyType {
  fn drop(&mut self) {
    unsafe { c::destroy_my_type(self.raw) }
  }
}

impl MyType {
  pub fn new() -> MyType {
    let raw = unsafe { c::make_my_type() };
    if raw.is_null() {
      panic!("out of memory");
    }
    MyType { raw }
  }
}

To reiterate, the goal of this RFC is to allow Rust code to define types that are dynamically-sized but do not have the overhead of fat pointers. Types defined wholly outside of Rust (such as opaque C structures allocated/deallocated by external code) or types without a static alignment are out of scope.

Apr 03 '24 06:04 jmillikin

@jmillikin

That's already possible in today's Rust, using struct Thin<'a, T> { inner: &'a () } and similar idioms. The problem is that such wrappers pollute the public API, so instead of struct SomeStructure you also end up with SomeStructurePtr, SomeStructureRefMut, etc. It's quite unpleasant to work with that kind of API, and definitely can't be published for general use.

I think I'd have to see an example of what you have in mind. I've found it fairly straightforward to use type parameters to encode fixed vs dynamic size states, but I'm not sure that relates to what you're saying here.

Pulling in bindgen and all the rest of its FFI semantics just to get unsized thin pointers doesn't seem great,

Oh, I wasn't suggesting that bindgen was necessary, just pointing to it as an example implementation of using ptr_metadata to manipulate thin pointers, and fatten them on demand. You can have:

struct Packet<PAYLOAD: ?Sized = [u8; 0]> {
    size: u16,
//...
    payload: PAYLOAD
}

and pass that around as a thin pointer/ref, and then when you want to actually use the payload do:

let fatpacket: &Packet<[u8]> = unsafe { &*ptr::from_raw_parts(packet as *const (), packet.size as usize) };

to get the DST variant which exposes the payload using the embedded size.

and also it doesn't work in normal Rust due to the use of unstable nightly features.

Well I'd hope ptr_metadata gets stabilized before whatever arises from this RFC does...

Apr 03 '24 07:04 jsgf