rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

Relaxed DST field ordering

Open clarfonthey opened this issue 1 year ago • 9 comments

Summary

Relax the requirements on struct field ordering for dynamically sized fields for repr(Rust) and repr(transparent), such that ?Sized fields can be anywhere in the field list, as long as there is only one.

Note that this RFC previously added changes to repr(C), but these have since been removed.

Rendered

clarfonthey avatar Oct 19 '24 01:10 clarfonthey

Also going to pre-empt this because, while it's not mentioned in the RFC because I genuinely could not find any info in the reference about it, it will probably come up:

What do we do about references to ZST fields after the DST in repr(C) structs? I'll be honest, I have no idea if references to ZST fields are even meaningful/useful or used, but right now there appears to be no documentation on it.

If people actually use references to ZST fields in repr(C) structs, I propose defining fields after the DST field to always have the same offset as the DST field itself, i.e., if said field were zero. We can treat this internally be reordering the ZSTs right before the DST.

But I had been effectively treating this RFC as if every reference to a ZST doesn't matter, and if people use ptr::eq for them they're doing something wrong and unspecified. Although I'm sure this is interesting enough to give a few people a headache!

clarfonthey avatar Oct 19 '24 02:10 clarfonthey

[in repr(C) structs] I propose defining fields after the DST field to always have the same offset as the DST field itself, i.e., if said field were zero. We can treat this internally be reordering the ZSTs right before the DST.

I don't think we should allow putting fields after the DST tail in a repr(C) struct if they are just going to be reordered anyway. IMO the field order should be consistent[^1]; either we don't allow fields after the DST tail in repr(C) structs, or we allow them and they are at an appropriate (dynamic) offset after the DST tail (though it's IMO fine if this is restricted to ZST fields, for now).

But I had been effectively treating this RFC as if every reference to a ZST doesn't matter, and if people use ptr::eq for them they're doing something wrong and unspecified. Although I'm sure this is interesting enough to give a few people a headache!

I'm not sure what you mean here, but I don't think it is reasonable to assume that std::ptr::eq is unspecified or wrong for ZSTs; the documentation for that function does not currently have any exceptions for ZSTs.

[^1]: i.e. in a repr(C) struct, each field should be at offset prev_field_offset + prev_field_size, rounded up to this field's alignment, with the first field at offset 0.

zachs18 avatar Oct 19 '24 02:10 zachs18

I'm not sure what you mean here, but I don't think it is reasonable to assume that std::ptr::eq is unspecified or wrong for ZSTs; the documentation for that function does not currently have any exceptions for ZSTs.

You're right that you can make ZST pointers just fine (and indeed, *const () is a common pointer), but what I'm specifically talking about is whether taking a reference to a ZST ever creates a pointer inside its struct at all. Yeah, the lifetime rules seem to imply that, but if let value = (); takes up zero memory and does literally nothing, then how can I be certain that &value actually refers to a location on the stack at all?

Like, since ZST boxes all point to the same dangling value and ZST statics all point to the same dangling value, it doesn't seem to unreasonable that all ZST references would just be the same pointer (modulo alignment) by default unless dereferenced.

clarfonthey avatar Oct 19 '24 03:10 clarfonthey

whether taking a reference to a ZST ever creates a pointer inside its struct at all.

Yes, &struct.field will give a pointer that is inbounds of struct[^1].

but if let value = (); takes up zero memory and does literally nothing, then how can I be certain that &value actually refers to a location on the stack at all?

Formally, you can't assume that let value = (); exists on the stack because the stack is not a concept that exists in the context of the Rust Abstract Machine, but this is not specific to ZSTs. It might even be valid for rustc to put let value = [1, 2, 3]; into static memory, for example[^2]. AFAICT, rustc currently never makes locals into statics, so currently let value = (); will have an address on the stack.

ZST statics all point to the same dangling value

While this would be valid, this is not how rustc currently works AFAICT. Currently static X: () = (); will have an address in the same area as other (immutable) statics, and may or may not have the same address as other statics.


For more info: see https://github.com/rust-lang/unsafe-code-guidelines/issues/503

[^1]: assuming no autoderefing

[^2]: assuming that it is decided that non-mut locals are not mutable (which is not yet decided IIUC), and/or it is never mutated, and/or possibly other requirements

zachs18 avatar Oct 19 '24 07:10 zachs18

I don't really understand the motivation for this change. In general, it's nice that I don't have to worry about what's the most space-efficient field order in my types (especially if it depends on type parameters) unless I already care about their layout and therefore add repr attributes. However, in the rare case when I author (or even just look at the implementation of) a type wrapping a DST, I already have to be consciously thinking about memory layout to some degree. It's especially prominent if there are other (size > 0 or align > 1) fields in the mix because the only useful way to think of such a type is "some fixed-size header data followed by the DST". Reflecting this in the source code doesn't seem unnatural or like a chore that I'd want to delegate to the compiler. For example, consider the custom Arc-likes in rowan and triomphe.

I don't want to rule out that there's code where this RFC might be nice to have. But is a "nice to have" feature worth the can of worms w.r.t. ZST addresses and the repr(C) rules? Whatever the solution to that will be, I expect it will cast some doubt on the "simplifies the language" motivation.

hanna-kruppe avatar Oct 19 '24 12:10 hanna-kruppe

I guess that my biggest motivation for this change is specifically for the ZST change: it's extremely common to put PhantomData markers at the end of a struct, but a DST disrupts this for what feels like no reason. I don't ever need to reference these fields, so, the reference changes don't matter to me.

For repr(transparent), I feel like it'd be appropriate to just put all ZST offsets at the beginning, since the layout is supposed to be equivalent to the one non-ZST inside it. But I guess that I was overzealous with repr(C) and we really should just keep the requirement for that.

clarfonthey avatar Oct 19 '24 15:10 clarfonthey

Since it seems too contentious, I've removed repr(C) and updated the reference-level explanation to allow reordering of ZSTs for repr(transparent). Since repr(transparent) is intended to treat the entire struct layout like it's the singular sized field, this makes sense, since it means that the offsets of all fields are zero.

I do apologise for kind of rushing this RFC out— it's been something that has come up a few times as being inconsistent (imho), and I kind of just wanted to start a discussion on it rather than opening a pre-RFC on internals, since it feels like said discussion wouldn't substantially impact the RFC contents.

I probably should have thought about repr(C) more and just been extra conservative to exclude it from these new rules at first, so, I've done that now.

clarfonthey avatar Oct 19 '24 18:10 clarfonthey

I've not read the RFC in detail, just skimmed the OP, but I do think the rules regarding field ordering are somewhat silly given that (apart from #[repr(C)], etc) we don't guarantee field order so we can opt to reorder if we want to.

That said, I'd have to review, I seem to recall that rustc at least took advantage of this rule for a much more optimized sized check -- and maybe there are other details I've forgotten about.

nikomatsakis avatar Oct 20 '24 02:10 nikomatsakis

Would love to know how those checks work, since my impression of how repr(Rust) worked is that it would sort fields by alignment and then just order them from most to least alignment, then put the DST at the end if there was one. By this model, it doesn't really matter where the DST is in the list since it has to be separated out anyway, but I guess that it being the last index already is mildly useful.

clarfonthey avatar Oct 20 '24 02:10 clarfonthey

What about tuples? We often treat them like repr(Rust) structs, but the fact that the last field of a tuple can be resized actually has profound consequences: it means that in (T, U, V) we actually only reorder the first two fields, and the last field stays last. This is required to support tuple unsizing coercions. I don't think this is compatible with having unsized fields elsewhere in the tuple.

The RFC should probably state explicitly that tuples still only support the last field to be unsized, and explain why -- or explain whatever else is needed to keep tuple unsizing coercions working.


This RFC also makes error reporting harder, since currently we can simply fire off a trait query ensuring all but the last field are sized. But with this RFC we'll have to somehow fire off a query for all fields, process the results, and if more than 1 returns "might not be sized" we have to figure out a good way to report the error. That should maybe be listed as a drawback.

RalfJung avatar Oct 24 '24 10:10 RalfJung

I'm aware of an example of C struct, where some useful information is located after DST. It is linux_dirent (don't confuse with linux_dirent64). See more at https://manpages.debian.org/unstable/manpages-dev/getdents.2.en.html . This is not merely in-kernel structure, it is used at kernel-userspace boundary.

If we try to create Rust definition for this struct, we will get this:

struct linux_dirent {
  d_ino: c_ulong,
  d_off: c_ulong,
  d_reclen: c_ushort,
  d_name: ThinCStr, // DST!!! Note: this is not our current CStr, this is hypothetical future ThinCStr we all dream about. Also: this is not `&ThinCStr`, this is `ThinCStr` itself, i. e. this is actual DST
  d_type: c_char,
}

As you can see, d_type is located after d_name.

(Also, you can determine size of this struct in two different ways: first is reading d_reclen, second is calling strlen on d_name and doing necessary calculations, so, technically d_reclen is redundant.)

(All these is based on my interpretation of manpage. I didn't test anything.)

Said that, this is the only such struct I'm aware of. And moreover, it is superseded by linux_dirent64. So, please, don't change Rust language based on this example alone. I post all these just to make sure you are aware of it

safinaskar avatar Feb 03 '25 09:02 safinaskar

I'm going to close this for now; I definitely did not think of all the potential downsides before writing it, and while this still feels like an inconsistency to me, that can be resolved later in a separate RFC.

clarfonthey avatar Aug 13 '25 16:08 clarfonthey