rfcs
rfcs copied to clipboard
[RFC] Allow packed types to transitively contain aligned types
This issue was introduced in the original implementation of #[repr(packed(N))] and have since underwent extensive community discussions:
- #[repr(align(N))] fields not allowed in #[repr(packed(M>=N))] structs
- repr(C) does not always match the current target's C toolchain (when that target is windows-msvc)
- repr(C) is unsound on MSVC targets
- E0587 error on packed and aligned structures from C
- E0587 error on packed and aligned structures from C (bindgen)
- Support for both packed and aligned (in repr(C)
- bindgen wanted features & bugfixes (Rust-for-Linux)
- packed type cannot transitively contain a #[repr(align)] type
- structure layout using aligned attribute is incorrect
The solution proposed in this RFC mostly reflects what was proposed here
Do C/C++ toolchains for mingw address the #[repr(system)] issue in any way?
What do they do when they need to interoperate with msvc-style layouts?
@petrochenkov
My understanding is that MinGW doesn't really do anything, and attempting to call MSVC APIs from MinGW will result in ABI issues.
One thing that seems like an alternative would be @Jules-Bertholet's https://internals.rust-lang.org/t/pre-rfc-align-attribute/21004?u=scottmcm proposal for #[align] on fields.
Would that be enough to do this?
Yay, finally some progress on this front! I haven't dug into the details of this RFC, but based on a quick skim I am generally in favor.
An alternative, more minimal spot fix could be as such:
- Allow
#[repr(align(N), packed(M))]on structs with the semantics ofpacked(M)wrapped inalign(N). - Allow
#[repr(align(N))]on fields to request a higher alignment for the field without introducing a type wrapper or additional fields.
This would allow tools like bindgen to implement the odd MSVC packing rules without putting them into the language.
What changes is that
packed(M)types no longer have an alignment of exactlyM, but instead have an alignment of at leastM.
…Actually, if we want to exactly match the semantics of #pragma pack on MSVC, the "correct" behavior would actually be to allow the type to have alignment less than M still. MSVC's #pragma pack does not raise the alignment of a struct to the packing width, and in fact, x64 and ARM64EC operates under #pragma pack(16) by default. (x86, ARM, and ARM64 default to #pragma pack(8).)
There is actual utility in #[packed] working this way in Rust for generic containers. I don't think that the utility is anywhere near enough for the change, but it bears keeping in mind.
MSVC's #pragma pack does not raise the alignment of a struct to the packing width
Oh, I, uh, misremembered overconfidently. Oops.
@rustbot labels +I-lang-nominated
We're discussing this on the Lang/RfL call, as the underlying issue is something that the RfL team is running into. Let's nominate this for lang discussion.
@rustbot labels -I-lang-nominated +I-lang-radar
We discussed this briefly today to put this on our radar. Our general vibe was in favor of interest in solving this problem. Probably we need to read through this in more detail, and some details about the MSVC handling may need to be iterated upon here.
Link to my comment about repr(system) and either removing it or changing it to repr(windows) or repr(MSVC):
- https://github.com/rust-lang/rfcs/pull/3718#discussion_r2035946897
This should also be considered in the broader context of resolving the tension between the two roles of repr(C): as a portable, simple, linear layout, and as matching the layout of the C ABI for the current target. This also encompasses the problem around ZST on MSVC (https://github.com/rust-lang/rust/issues/81996), ~~and the AIX layout issue (https://github.com/rust-lang/rust/pull/135552, IRLO)~~ (that one may be resolvable without a different layout algorithm). I think there is general consensus now that those roles should be played by two separate reprs, but no consensus on how to best get to that world. Part of that question is: what should we do when someone defines a type that does not exist in the C ABI of the current target (e.g. because C does not have true ZST).
This should also be considered in the broader context of resolving the tension between the two roles of repr(C): as a portable, simple, linear layout, and as matching the layout of the C ABI for the current target.
Not sure if this has been brought up before but ats::alloc::Layout defines in the example for Layout::extend an algorithm for replicating the layout of a #[repr(C)] struct and its field offsets.
To calculate the layout of a #[repr(C)] structure and the offsets of the fields from its fields’ layouts:
pub fn repr_c(fields: &[Layout]) -> Result<(Layout, Vec<usize>), LayoutError> { let mut offsets = Vec::new(); let mut layout = Layout::from_size_align(0, 1)?; for &field in fields { let (new_layout, offset) = layout.extend(field)?; layout = new_layout; offsets.push(offset); } // Remember to finalize with `pad_to_align`! Ok((layout.pad_to_align(), offsets)) }
I'm really not sure how to handle the MinGW layout being slightly different than the Windows API layout. That's such an unfortunate situation. If we're going to make any changes then there needs to be a repr that will always match the layout as defined by Windows, so code that calls external DLLs on Windows can have a consistent layout regardless of whether you use MinGW or MSVC. If switching between windows targets causes layout changes that will be a nightmare to debug.
As for C vs a portable stable linear layout, we should absolutely have separate reprs for both. If we want to ensure maximum compatibility then we'd likely want to add two new reprs entirely, while leaving repr(C) as it is but deprecated, perhaps even a hard deprecation error in a future edition. Trying to adjust the behavior of repr(C) seems fraught with peril due to a multitude of factors so I don't think that can be sanely considered.
I'm really not sure how to handle the MinGW layout being slightly different than the Windows API layout. That's such an unfortunate situation.
How to people programming for MinGW deal with that? Isn't this a plain MinGW bug?
I agree that we have to resolve the issue of the two repr(C) roles, and I would like there to be a repr(linear) in the future, with anything that has C in the name being reserved for interop.
I think we can avoid blocking this RFC on that though. Perhaps the best way to resolve this for now is to introduce #[repr(C(system))] and #[repr(C(target))], and only allow the combination of repr and packed to exist with one of those.
We could pick a default later on if we wanted to. Not picking a default now can ease an automatic transition from #[repr(C)] to #[repr(linear)], where I would prefer not to make subtly different layout choices like this based on the target.
Agree with @tmandry here. So to summarize:
- #[repr(C)]: The current and existing behavior. packed and aligned fields are banned.
- #[repr(C(system))]: Allows packed and aligned fields. Behavior matches OS layout. Intended for structs calling system API.
- #[repr(C(target))]: Allows packed and aligned fields. Behavior matches target layout. Intended for structs calling libraries compiled for the same target.
- #[repr(linear)]: Consistent behavior across all platforms. Intended for storage and transmission.
I would propose that we name them #[repr(system)], #[repr(target)] and #[repr(linear)] and soft-deprecate #[repr(C)] by emitting a warning asking the user to clarify their intention.
@PixelDust22 I would recommend taking up the shorter names (and #[repr(linear)]) in a separate RFC. Another lang team member has already voiced discomfort with adding #[repr(system)] as part of this RFC. I'm guessing you want access to the feature more than you want these specific names, and I think it's worthwhile to consider the new names and a deprecation/compatibility/migration story altogether as one separate RFC.
For my part I can see how we would want shorter names, but the bare names "target" and "system" don't convey that this is the C ABI we're talking about. Maybe that is correct and we should be talking about the C ABI as the system and target ABI – the terms are very commonly used to describe just that – but I won't feel sure until I've heard a few different perspectives.
Ok sounds good. I will update the RFC for #[repr(C(system))] and #[repr(C(target))].
I think we should try to match repr(system) to extern "system" fn -- we already have the latter, why not the former? as for matching the C ABI, I think requiring repr(C(target)) (or maybe repr(cdecl)?) for one edition and then switching back in a later edition is good. repr(linear) is good for the gimme-simple-predictable-layout case.
#[repr(C)]: The current and existing behavior. packed and aligned fields are banned.
Except that they are (accidentally) allowed when using generics, so this ban makes no sense - we still have to specify what happens here.
We should either lift the ban entirely (maybe make it a lint, but still document what happens when you ignore the lint), or we need a post-mono check that actually enforces the ban properly.
#[repr(C)]: The current and existing behavior. packed and aligned fields are banned. Except that they are (accidentally) allowed when using generics, so this ban makes no sense - we still have to specify what happens here. We should either lift the ban entirely (maybe make it a lint, but still document what happens when you ignore the lint), or we need a post-mono check that actually enforces the ban properly.
My preference would be to enforce the ban properly. But it should be merely a scary warning until the new reprs are available, then it should turn into an error.
I agree that deprecating repr(C) and introducing repr(C(target)) and repr(C(system)) is likely the best approach for this RFC.
But a note for future possibilities — there's a desire to be able to write #[repr(c_int)] or similar to set a type’s repr using a type alias. RFC#3659 originally proposed to enable this syntax, but has switched to #[repr(tag = alias)] instead due to the issues caused by introducing a type alias C or any of the built in reprs. Since extern "C" uses quotes for the ABI string, one option is to use repr("C") for “match target’s C layout” and repr("system") for “match target system ABI’s layout.” If we also switch to using repr("transparent"), this frees up repr(ident) to do type namespace lookup. You can already make code horribly unclear by shadowing the primitive types; shadowing the primitive reprs doesn't really[^1] make that materially worse.
[^1]: Well, it makes it worse for one application: macros. Macros can't do name resolution, and repr not doing name lookup allows macros to look at the identifier and know the repr for sure. This is useful for things like #[derive(zerocopy::FromBytes)], which needs to know the specific int of an enum repr. Furthermore, while I'm not exactly sure if this is the case, if proc macros construct tokens that use the call-site edition, any current proc macro relying on repr(ident) for soundness would gain a subtle soundness hole... which already exists if you shadow ::std, ::core, the primitive type names, and/or the macro's runtime crate name, as necessary.
For more cases where we want repr(C) to differ from repr(linear), see this IRLO discussion and its context. I think that needs to be resolved first before we can consider dealing with more specific memory layout issues.
This RFC includes a point that goes even further, since windows effectively has two ABIs (MSVC and GNU) and code built for the windows-gnu targets (and the windows-gnullvm targets? Not sure about that) may have to interact with both ABIs, thus requiring a way for the programmer to express this difference. For function call ABIs, we have extern "C" vs extern "system", but we currently don't have anything comparable for memory layout.