rfcs `repr(tag = ...)` for type aliases

Primitive representations on enums now accept type aliases, meaning that in addition to primitives like #[repr(u32)], #[repr(tag = core::ffi::c_int)] and #[repr(tag = my_type)] are now accepted.

Internals discussion: https://internals.rust-lang.org/t/pre-rfc-type-aliases-in-repr/20956 Last comment on RFC under first version (self::): https://github.com/rust-lang/rfcs/pull/3659#issuecomment-2171268028 Last comment on RFC under second version (type = ...): https://github.com/rust-lang/rfcs/pull/3659#issuecomment-2181588275 Last comment on RFC under third version (discriminant = …): https://github.com/rust-lang/rfcs/pull/3659#issuecomment-2221105082

Rendered

Jun 15 '24 02:06 clarfonthey

This seems like a straightforward change that we should support to improve type definitions for interfaces that depend on type aliases, particularly those that vary by target.

@rfcbot merge

That said, I think we should go with the mentioned alternative of allowing type aliases to shadow reprs (with a lint) in a future edition, to avoid forcing people to write repr(self::c_int) or similar.

@rfcbot concern allow-type-aliases-to-shadow-reprs

(We may even consider allowing that in current editions on the basis of a crater run turning up no conflicts.)

Jun 15 '24 05:06 joshtriplett

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

[x] @joshtriplett
[x] @nikomatsakis
[ ] @pnkfelix
[ ] @scottmcm
[x] @tmandry

Concerns:

~~allow-type-aliases-to-shadow-reprs~~ resolved by https://github.com/rust-lang/rfcs/pull/3659#issuecomment--2116002710
~~ambiguity~~ resolved by https://github.com/rust-lang/rfcs/pull/3659#issuecomment--2124076846
naming-and-syntax (https://github.com/rust-lang/rfcs/pull/3659#issuecomment--2073862214)

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns. See this document for info about what commands tagged team members can give me.

Jun 15 '24 05:06 rfcbot

@rfcbot concern ambiguity

This makes things like

type transparent = u64;
#[repr(transparent)] enum Foo { … }

legal.

That seems particularly bad for any other proc macros that want to look at the attribute as part of things like their own soundness checks, since they'd no longer know what's going on from what's in the attribute.

Also, I'm always a bit sad when we have "real" stuff in an attribute, in the sense that it's something that we have a proper grammar construction for rather than just needing to deal in tokens.

It makes me think of changing it to instead be

enum Foo : c_int { … }

or something instead.

Jun 15 '24 21:06 scottmcm

@rfcbot resolve ambiguity

I'm an idiot and somehow missed the "you need self:: part" 🤦

That does solve the hard blocker, but it still makes me wish for a better thing instead so that use actually works nicely.

Jun 15 '24 21:06 scottmcm

Just an idea for a another syntax that is not ambiguous: #[repr(type = c_int)

And can I use complex types such as <Self as Foo>::Bar or should it be limited to plain path?

Jun 16 '24 04:06 ogoffart

Hmm, making the existing repr(type) aliases for repr(type = X) does actually feel right. It was one of the things I initially was thinking but kind of discarded it.

Although, I'm not sure if "type = " is the right name, since we're really talking about the enum discriminant here, but "discriminant" is unfortunately a very, very long name.

Perhaps repr(as = u32) is a good compromise? Then, stuff like repr(as = c_int) works as expected.

In terms of allowing shadowing, the main reason why I'm totally against it is because there's no way to override the shadowing. For example, you can shadow u32 but get it back by referencing the absolute path ::std::u32, whereas there'd be no such thing for repr(transparent) once you've shadowed it to a type alias. I don't like that idea since, as @scottmcm says, you'd have a way to easily destroy proc macros and the like and make them no longer able to properly reason about things.

Jun 16 '24 06:06 clarfonthey

#[repr(type = ...)] seems like a great solution here. I agree that that's better than allowing shadowing.

"type" or "base" or several other possible names could work there.

Jun 16 '24 06:06 joshtriplett

I also really like #[repr(type = ...)]. It allows greater flexibility in the future without clashing with existing reprs that are assumed to have an unambiguous meaning. When I read #[repr(u8)], it would be clear that this type has a u8 primitive repr, while a ``#[repr(type = u8)]type has a repr that matches the in-scope type namedu8`, which may be shadowed to not refer to the primitive. So this explicit form provides an extra warning sign that shadowing is possible (while the existing form guarantees a fixed repr). It would still probably be good to lint against a shadowed repr where the discriminant type looks like one of the primitive reprs.

Jun 16 '24 08:06 juntyr

Decided to just update the RFC to use the type = proposal. There may be a few rewriting errors, but hopefully I caught them all. Hopefully this resolves the concern brought up by @joshtriplett.

Jun 16 '24 18:06 clarfonthey

This is not straightforward, we currently never have "real code" (types, expressions, patterns) in inert attributes.

For the purpose of this feature we'll need treat the repr attribute specially in the frontend, unlike any other attribute, as if it was a real non-attribute feature with a dedicated syntax (keep it in AST/HIR separately, resolve and expand it separately, insert into into the definition tree, etc). That is going to be a source of pain and hacks. Previously features requiring real code in them ended up having real syntax as well.

(In other words, this is not what inert attributes are generally supposed to be used for.)

Jun 17 '24 12:06 petrochenkov

Although, I'm not sure if "type = " is the right name, since we're really talking about the enum discriminant here, but "discriminant" is unfortunately a very, very long name.

Discriminant might be long, but on the other hand it describes exactly what this is, and makes it very clear and obvious. It is also consistent with std::mem::discriminant Since the repr attribute is always on its own line, it's not like we don't have room for it and its also not something we type very often.

So if it was me, I'd go with discriminant =

It doesn't look bad at all:

#[derive(Clone, Copy, Eq, PartialEq)]
#[repr(discriminant = u32)]
#[non_exhaustive]
enum Foo {
   //...
}

Jun 18 '24 10:06 ogoffart

@petrochenkov How difficult is it to make #[repr] a non-inert attribute, or at minimum separate #[repr(u8, type = u8)] as non-inert from the inert #[repr(C, packed, align, simd, transparent)]? Would doing so introduce any breaking regressions?

Jun 18 '24 13:06 kennytm

@kennytm It would be easy to add a new built-in attribute macro, like #[repr_type(MyType)], expanding directly to the new AST node, similarly to what offset_of and friends do. Turning an inert built-in attribute into a macro is a breaking change. Upd: maybe not much more breaking than adding new things to prelude though.

Jun 18 '24 13:06 petrochenkov

I'm a bit confused about the concept of inert attributes versus attribute macros, and also why particularly this would be a problem to implement. I assume you're totally correct about the specific issues with implementing this, but could use a bit more info to fully understand what we're going for.

From my (relatively naïve) perspective, when the definition is expanded in the HIR, we just resolve the type alias when we put together the enum definition, since we should have the information at that point. Since we're literally just listing allowed types for the alias, it shouldn't need much extra logic. But this appears to be wrong, and I'm not sure why.

Jun 18 '24 17:06 clarfonthey

@rfcbot resolve allow-type-aliases-to-shadow-reprs

Thank you to @ogoffart for the better type = syntax, and to @clarfonthey for incorporating it.

Jun 19 '24 15:06 joshtriplett

@rfcbot reviewed

This seems like a great idea to me. As a future extension, I would like it if we could support types like the following:

#[repr(transparent)]
pub struct SecretInteger(u16);

Since that would allow "delegating representations" without allowing code to rely on the specific value of a type alias (this is especially useful when it changes on rarely-tested platforms).

If you agree that this is sensible, could you please add it as a future possibility?

Jun 20 '24 21:06 tmandry

So, taking in additional suggestions:

type = ... is now discriminant = .... This is now both consistent with #3607 (even though that's not official yet) and feels more accurate. The typing is worth it.
Elaborated the suggested lints. Instead of recommending that people use #[repr(type)] directly when type is a primitive, recommend that they use #[repr(discriminant = type)] where type is the absolute path to a type. Make this an allow-only or clippy::pedantic lint that requires enabling by default.
Explained the guide-level explanation of shadowing a bit better, recommending that people use absolute paths if they're worried about shadowing rather than just recommending they use the old syntax.
Updated future possibilities to include repr(transparent) aliases and limited-value aliases like NonZeroU* and char.

Still need to go through more existing feedback though, so, there will probably be more changes. I don't expect to change the syntax again, though.

Jun 21 '24 19:06 clarfonthey

I'm a bit confused about the concept of inert attributes versus attribute macros, and also why particularly this would be a problem to implement. I assume you're totally correct about the specific issues with implementing this, but could use a bit more info to fully understand what we're going for.

From my (relatively naïve) perspective, when the definition is expanded in the HIR, we just resolve the type alias when we put together the enum definition, since we should have the information at that point. Since we're literally just listing allowed types for the alias, it shouldn't need much extra logic. But this appears to be wrong, and I'm not sure why.

If #[repr(discriminant = TYPE)] accepts types in general, then some interesting cases are

#[repr(discriminant = type_macro!())]
#[repr(discriminant = U<16>)] // An anonymous constant in generic arguments
#[repr(discriminant = new_crate::Type)] // Loading some new crate that previously wasn't in the dependency graph

All these cases require having the TYPE in AST during macro expansion already (or at least immediately after it for 3.), because macro expansion (for 1.) and definition tree building (for 2.) work on AST, and there is no AST inside attributes, only unstructured tokens. The "when the definition is expanded in the HIR" point is too late for name resolution in the current compiler, because we are resolving all paths immediately after macro expansion and then freeze the dependency graph. (There are good reasons for that, e.g. we'd generally want to be able to produce depinfo for build systems without doing full compilation, so we should not lazily load new crates from type checking or codegen or something. Also that's how incremental compilation is currently organized.)

So what are the possible ways to put TYPE into AST early enough.

Add a first class syntax for it, that's the most natural and the least hacky way.

enum E: TYPE { ... } // E.g. something like this (the specific syntax is taken from C++).

Add a procedural macro that will expand directly to AST, that's equivalent having an unstable "secret" syntax. This is similar to features like offset_of!(...), the approach feels a bit like a temporary solution for features that may or may not get a first class syntax later.

// Before expansion
#[repr_discriminant(TYPE)]
enum E { ... }

// After expansion, you may see this with `-Zunpretty=expanded`.
enum E builtin#repr TYPE { ... }

Turn repr itself into a procedural macro, then the discriminant part will expand into builtin#repr TYPE and the rest will expand into some inert attribute #[inert_repr] equivalent to the current #[repr]. This feels more hacky to me. There's a small chance of this causing breakage because macro expanded code is a subject to "time travel" restrictions, e.g. see one recent example (probably no breakage in practice though).
Turn repr into an active built-in attribute. "Active" means it acts like a macro and transforms code, and "built-in" in relation to attributes means that it doesn't go through name resolution (doesn't support shadowing, and we immediatly know what it is when we encounter it during expansion). This is a relatively major hack, only two active built-in attributes exist now for historical reasons - cfg and cfg_attr. In the past we prevented doc from becoming an active attribute by replacing #[doc(include = "path/to/somewhere")] with #[doc = include_str!("path/to/somewhere")], and I'm pretty happy about that now. In this solution #[repr(discriminant = TYPE)] immediately expands into the secret syntax and inert components when it is encountered during expansion.

// Before expansion
#[repr(discriminant = TYPE, packed)]
enum E { ... }

// After expansion, you may see this with `-Zunpretty=expanded`.
#[repr(packed)]
enum E builtin#repr TYPE { ... }

Jun 21 '24 20:06 petrochenkov

The comment above (https://github.com/rust-lang/rfcs/pull/3659#issuecomment-2183441423) is more or less about keeping the status quo, but there's another more radical solution - stop treating built-in attributes as attributes!

Instead, tread all 170 built-in attributes as 170 new context-sensitive keywords and 170 corresponding new syntaxes that only look similar to general purpose token-based attributes. Every built-in attribute would then be parsed into its dedicated AST node instead of a general attribute node with tokens inside.

Then any wild fantasies could be implemented using these new attribute-like syntaxes, e.g. we could put blocks with executable statements inside them, codegen them as functions and call from somewhere (e.g. for contracts). Of course all such syntaxes will need to be individually threaded through the compiler (AST/HIR/TypeIR/MIR/etc) like any other language features with non attribute-like syntaxes, and not like token-based attributes.

For existing attributes the change will be breaking because the validation will be less lazy (at parse time, including inside cfg(FALSE), instead of some time later). This is also probably out of scope for the repr(discriminant) RFC specifically.

Jun 21 '24 21:06 petrochenkov

Oh right, this isn't the first time we've bumped into wanting paths to appear inside attributes. We also wanted it for TAIT for #[defines(<path>)].

My feeling is that we should be able to do this, even if it means making repr builtin or act like a macro. (Same for defines, except that name may be more common so we might prefer the macro approach.)

Longer term it would be ideal if we could declare attributes that accept paths using a declarative syntax similar to macro_rules!. Sounds like this might be a larger change though.

Jun 21 '24 23:06 tmandry

Didn't respond when I first read it, but thank you @petrochenkov for the analysis. I hadn't even considered macro expansion in the attribute, but that's definitely something we'd want to have.

I definitely think that this should remain an attribute; while a syntax like C's is tempting, this really feels more like an annotation on the type rather than a fundamental part of it, which is what attributes are for.

Will probably try and update the RFC to mention stuff like macro expansion and generics in it, though. Effectively, any type, macro-generated or not, should work with this.

Jun 26 '24 18:06 clarfonthey

FWIW, :+1: for having a "native" syntax for this as well, but not as a prerequisite. I think we should go ahead and have a #[repr] version of this because we already have the #[repr] attribute, but I also think we should have the enum E: u32 or enum E: path::Type syntax for brevity. I don't think we should block this on that syntax (since it sounds like we have a couple of other valid paths), but if someone wanted to propose that syntax I'd happily :ballot_box_with_check: it.

Jul 03 '24 17:07 joshtriplett

@rfcbot reviewed

Jul 03 '24 17:07 nikomatsakis

:bell: This is now entering its final comment period, as per the review above. :bell:

Jul 03 '24 17:07 rfcbot

@rustbot labels -I-lang-nominated

We discussed this in the lang call today. People were feeling positively about this, and it's now in FCP.

Jul 03 '24 20:07 traviscross

Now that this is in FCP, I decided to go through the last of the feedback and add more info to the RFC so folks that get notified via TWIR or otherwise can get a reasonable picture of what this feature is proposing.

Let me know if you have any additional feedback on the contents of the RFC itself.

(Note: these aren't any more design changes, just clarifications on alternatives, prior art, and drawbacks. In particular, I mentioned the desire for syntax and how that can still be added after this RFC is merged.)

Also, I still would like to express my preference for the attribute over the syntax regardless. To me, #[repr(discriminant = T)] enum U { ... } is much more descriptive than enum U : T { ... }, especially since equal weight is put to the name of the enum and its discriminant type in the latter case. Maybe that could be improved somehow, but this is at least how I feel about the proposed alternatives.

Jul 03 '24 20:07 clarfonthey

@rfcbot concern naming-and-syntax

Filing a concern both for the question of discriminant vs tag and because @scottmcm has an RFC in progress for the enum E: Type syntax.

Jul 10 '24 17:07 joshtriplett

I didn't want to have to elaborate more on the context of syntax, but since the FCP is not happening any more, I might as well.

First, I'll start with the obvious. A colon is a terrible syntax. There's a reason why, when given the chance to design a new language from scratch instead of bolting on additional syntax like C++ does, Java and many other languages chose to use a keyword to describe the relationship between the name of the type and whatever occurs after the colon. Sure, Java also has distinctions like extends and implements that didn't exist in C++, but regardless, having a keyword is more clear.

With enum E: Type, the relationship between E and Type is incredibly unclear in general, and the precedent from C++ and other languages is that Type is the parent type of E, which is not true in this case. E isn't even a child type of Type, since explicit discriminants for enums with payloads has been allowed for a while, and this mucks up the semantics even more. Yes, the relationship in enum E: u8 is clearer, but this RFC is explicitly talking more about cases like enum E: MyType, where the actual type behind MyType may not be as clear.

Second, the ship has already sailed as far as discriminant and tag are concerned. We use the term discriminant in the standard library, throughout the compiler internally, and even new RFCs like #3607 are proposing to use the term "discriminant" to refer to the discriminant as we have been describing it. "Tagged union" was used as a term because it mimics what people have been saying in the C/C++ vernacular, but it is not part of the Rust vernacular, and I see no reason to treat it that way all of a sudden. Documentation and references, in my opinion, simply use the C/C++ vernacular because that is what people unfamiliar with Rust might know them as, to bridge common ground. I'm not counting libraries like serde using the term "tag" because in those cases, it is referring to a particular representation of the discriminant and not the discriminant itself.

Like, I personally have no issues with adding a dedicated syntax for this in the future. I just don't think that enum E: Type is a fleshed-out dedicated syntax and that #[repr(discriminant = ...)] is, and that personally, I think that most of the apprehension with making this an attribute largely boil down to implementation concerns in the compiler that people dislike and want to change anyway.

I think that an attribute fits this feature because, as mentioned before, any syntax will likely be less descriptive and more confusing. Elevating repr(discriminant = ...) to a dedicated syntax feels disproportionate to other features which are perhaps more ubiquitous, like repr(transparent) or repr(C). There are plenty of features which exist as attributes which maybe eventually will have syntax versions, like non_exhaustive, and I think it's perfectly fine to use attributes to be descriptive when syntax otherwise might be difficult to nail down.

Jul 10 '24 18:07 clarfonthey

I went to mark this waiting-on-team, but it looks like that label doesn't exist here. Please treat this RFC as waiting on us to give better feedback.

Jul 10 '24 18:07 scottmcm

I should add: I was having a sour day before I saw this, so, that is definitely reflected in my response.

I think that it's absolutely reasonable to want to pause the FCP because you have additional concerns without fully fleshing out those concerns, since you are on a time limit. I just wish that were the reason stated for pausing the FCP, instead of explicitly providing not-fleshed-out concerns as an excuse instead. Because as far as I'm concerned, the reasons mentioned were addressed before the FCP started, and no additional reasons were cited.

Jul 10 '24 18:07 clarfonthey