rfcs Add an expression for direct access to an enum's discriminant

Third-party derive macros want to be able to use the same direct discriminant access to the enum for which they're implementing a trait that the core derive macro does, which seems entirely reasonable.

To allow that without imposing a new category of major breaking changes on library authors, this proposes a safe built-in way for getting the discriminant as a primitive integer -- not an opaque mem::Discriminant -- for anything on which you can see private fields.

Rendered

Apr 07 '24 04:04 scottmcm

As alternative we could have not magical field b.enum#discriminant, but magical "type"/"trait" enum: enum::discriminant(&b)

let b = Enum::Tuple(true);
assert_eq!(enum::discriminant(&b), 13);

Apr 07 '24 04:04 VitWW

As alternative we could have not magical field b.enum#discriminant, but magical "type"/"trait" enum: enum::discriminant(&b)
let b = Enum::Tuple(true);
assert_eq!(enum::discriminant(&b), 13);

Trait impls are global, so this fails to achieve the goal of this RFC to restrict access to the discriminant to the crate that declared the enum.

Apr 07 '24 08:04 RalfJung

It seems the primary reason of this RFC is that .enum#discriminant is pub(in self) so that, as mentioned in rust-lang/rust#106418, it allows author of the enum to reorder any variants without breaking change.

But according to #1105, having a > b in 1.1.0 and a < b in 1.2.0 is not considered a necessary condition of breakage, unless the variant order is explicitly documented:

In general, APIs are expected to provide explicit contracts for their behavior via documentation, and behavior that is not part of this contract is permitted to change in minor revisions.

So I don't see much advantage of this RFC over e.g. allow users to use std::intrinsics::discrimant_value() (e.g. via std::mem::discriminant(&a).value()).

Apr 07 '24 09:04 kennytm

So I don't see much advantage of this RFC over e.g. allow users to use std::intrinsics::discrimant_value() (e.g. via std::mem::discriminant(&a).value()).

Well, changing types is still a major breaking change, right? So if this returns isize for enums without reprs, then changing to a better repr type would still be a major breaking change, which it isn't today since that's not exposed. (Other than in layout, but layout is explicitly minor.)

And TBH I don't think that "the actual ordering isn't really a semver property unless written out in docs" is realistic, no matter what it says in that RFC. APIs like https://doc.rust-lang.org/std/collections/struct.BTreeMap.html#method.range mean that entirely reasonable things like "I want to get first/everything/last of a particular variant from this map" are forced to take a dependency on the order of things.

(For example, if you have #[derive(...)] enum Foo { A(i32), B(String), C(i32) }, getting the last B from the BTreeMap means .range(..C(0)).next_back(), and there's no efficient way to write that without depending on the Ord staying the same.)

The RFC also says

This policy will likely require some revision over time, to become more explicit and perhaps lay out some best practices.

and I think this is a care where it's de facto different now.

For example, we don't even document that Less < Equal for cmp::Ordering -- sure, it has explicit variants, but if changing those has been a "minor" change this whole time, then you can't rely on those in the rustdoc either.

And we didn't document None < Some(_) until https://doc.rust-lang.org/1.56.0/std/option/index.html#comparison-operators -- that's roughly 6 years after 1.0.

So if even the standard library hasn't been documenting the orders for its enums, even for things where people have clearly been relying on them, I don't think it's fair to say people are wrong for expecting it not to change.

To me that note in 1105 is more about things like how the exact bytes from a compression algorithm, for example, shouldn't be considered a semver guarantee, since the actual postcondition is that the bytes are decompressible back to the original data. But with an output of bool, it's really hard for me to say that the exact result isn't the semver-relevant post-condition for PartialOrd::lt, say.

And of course there's things like

let foo: [u8; char::UNICODE_VERSION.0 as _] = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14];

where obviously you deserve the breakage you'll get (though for that one it's particularly true since it's documented as being able to change).

As a specific example where being at least crate-private is important to leave options open:

Today, None is 0 but Ok is 0, and that means that ControlFlow can't match both, which means that the ? desugaring can't be a NOP for both, only for one of them.

It would be nice to change the discriminants on one of them (probably Result) to be the other way around to simplify that codegen, but I think that's just far more practical without introducing a way for people to depend on the currently-invisible values.

Apr 09 '24 19:04 scottmcm

@scottmcm

Well, changing types is still a major breaking change, right?

I'm pretty sure changing #[repr] is a breaking change, with or without discriminant_value. At the very least it makes the following code break if you change from #[repr(u8)] to #[repr(u16)].

#[repr(u8)]
enum Enum {
    A = 1,
}
let _: u8 = unsafe { std::mem::transmute(Enum::A) };

Now, #1105 did say "not all breaking changes are major"[^2], and as you have said "layout is explicitly minor"[^1]. But if we accept breaking transmute I don't see why not accepting breaking PartialOrd of the discriminant values (not even the enum values themselves).

[^1]: though nowhere in #1105 I could find the word "layout" [^2]: heck changing the result of any const fn-evaluable expression is breaking

(For example, if you have #[derive(...)] enum Foo { A(i32), B(String), C(i32) }

If the enum explicitly derived PartialOrd, it would be the enum author's responsibility to not change the variant order because they are going to change Foo::A(1) < Foo::C(2), let alone discriminant_value(&Foo::A(1)) < discriminant_value(&Foo::C(2)).

Let's focus on the case where PartialOrd is not derived on the enum. I think everyone in rust-lang/rust#106418 agreed that if E derived PartialOrd there is no problem to impl Ord for Discriminant<E>.

Today, None is 0 but Ok is 0, and that means that ControlFlow can't match both, which means that the ? desugaring can't be a NOP for both, only for one of them.

It would be nice to change the discriminants on one of them (probably Result) to be the other way around to simplify that codegen, but I think that's just far more practical without introducing a way for people to depend on the currently-invisible values.

Result already derived PartialOrd, Ok(3) < Err(3), as mentioned in #3058. Yes you could change the implementation to not use derive to "stabilize" the order without relying on the discriminant value.

enum Result2<T, E> {
    Err(E),
    Ok(T),
}

impl<T: PartialOrd, E: PartialOrd> PartialOrd for Result2<T, E> {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        match (self, other) {
            (Self::Ok(a), Self::Ok(b)) => a.partial_cmp(b),
            (Self::Err(a), Self::Err(b)) => a.partial_cmp(b),
            _ => discriminant_value(other).partial_cmp(&discriminant_value(self)), // reverse order
        }
    }
}

Then reordering Ok / Err is just going to break those people relying on discriminant_value(&Ok(3)) == 0 && discriminant_value(&Err(3)) == 1 which I'd say let them be broken — basically the stance in https://github.com/rust-lang/rust/pull/106418#issuecomment-1954783410.

Anyway I'd expect

If the enum is #[repr(Rust)][^3], then third-party crates must not rely on specific value of the discriminants (other than being distinct), because the enum author is free to reorder the variants
If the enum is #[repr(inttype)][^4], then the discriminant values are guaranteed stable, because the Rust spec guaranteed the pointer-casting method extract that inttype tag. The enum author should not change the discriminant values nor representation type between minor versions. If they reorder the variants, explicit discriminants should be assigned to maintain stability.
Not sure about #[repr(C)], treat it as the same as #[repr(C, c_int)] here?

[^3]: note that Result, Option, ControlFlow are all plain #[repr(Rust)] enums.

[^4]: note that cmp::Ordering is #[repr(i8)]

Apr 09 '24 22:04 kennytm

2. though nowhere in 1105 I could find the word "layout"

Well, it's written in other places, like https://doc.rust-lang.org/std/mem/fn.size_of.html's

In general, the size of a type is not stable across compilations, but specific types such as primitives are.

for repr(Rust) types, or how if you ask rustdoc to include layout information it says

Note: Most layout information is completely unstable and may even differ between compilations.

But I guess you're talking about semver on it, but it's not surprising that 1105 doesn't talk about things like repr(transparent) that stabilized 3 years later.

But -- less authoritatively -- rustdoc has been doing things like https://github.com/rust-lang/rust/pull/115439 that also suggest some level of consensus that non-Rust reprs aren't necessarily semver guarantees either.

I suppose mostly I'd rather have things checkable by tools where possible, and everything that's "minor" is a thing that the tool would like to check but we decided to make it so it shouldn't.

And it annoys me when things that are otherwise entirely nominal in Rust (at the language level) today start to care about order, as this would do to enum variants with fields. Things being breaking because the author of the library used the library in a particular way is fine, but language features forcing breaking changes -- even if de jure minor -- feels bad to me when we don't need to.

To me, "minor" is for "well, we wrote ourself into a corner and the only way to allow really any changes at all is to make this de jure allowed even though it's definitely technically breaking", especially when there's an option for the consumer to write code in a way that's resilient to minor breakage. I can't justify deeming this minor in the way that I easily can for the ones that are "well otherwise any addition at all is breaking", like exists for * imports or method resolution.

But those are larger conversations than just this RFC, I suppose. Really the core problem is that there's no way to say which things you're doing are implementation details and which aren't. After all, there's no code marker for the difference between "my type is using repr(C) because I needed it for some internal pointer tricks" and "my type is using repr(C) because it's fixed for this OS FFI and cannot change ever" :/

Apr 09 '24 23:04 scottmcm

Well, it's written in other places, like https://doc.rust-lang.org/std/mem/fn.size_of.html's

Thanks. I hope there is a centralized place to look these up :smile: (maybe obi1kenobi/cargo-semver-checks#5?)

But -- less authoritatively -- rustdoc has been doing things like https://github.com/rust-lang/rust/pull/115439 that also suggest some level of consensus that non-Rust reprs aren't necessarily semver guarantees either.

Well I wouldn't count #[repr(transparent)] as non-Rust :sweat_smile:; besides 115439 only hides the attribute when the inner type is private.

IMO if the #[repr] appears in the rustdoc (like https://doc.rust-lang.org/core/cmp/enum.Ordering.html), it is part of the API contract and changing it should be breaking. That said, std::cmp::Ordering only gained #[repr(i8)] at 1.57 by rust-lang/rust#89507[^1], and the values -1, 0, 1 are shown in the doc only since 1.75 due to a rustdoc improvement rust-lang/rust#116142.

That means at least until 1.57 for non-ABI purpose rust-lang/rust did treat #[repr] change as a minor change[^2], and until 1.75 breakage due to changed discriminant value was also not obvious.

(If rust-lang/rust#86772 ever got implemented then <E as AsRepr>::Repr would be another source of breakage)

[^1]: technically not breaking since -1..=1 is within the i8 range it is layout-compatible with the previous #[repr(Rust)]` form [^2]: cargo-semver-checks v0.6+ do consider them major breaking change though: obi1kenobi/cargo-semver-checks#29

Apr 10 '24 08:04 kennytm

There are extensive docs on what is and is not semver breaking at https://doc.rust-lang.org/cargo/reference/semver.html.

Apr 10 '24 08:04 RalfJung

I'd also like to note that by exploiting the implementation details of Hash it is already possible to extract the discriminant value in safe and stable Rust (without the repr type at compile-time though). Of course we could argue that Discriminant::hash() is not guaranteed to simply call hasher.write_isize().

… Additionally the data passed by most standard library types should not be considered stable between compiler versions.

This means tests shouldn’t probe hard-coded hash values or data fed to a Hasher …

use std::mem::Discriminant;

#[derive(PartialEq, Eq, Debug)]
#[non_exhaustive]
pub enum DiscriminantValue {
    Usize(usize),
    Isize(isize),
    U8(u8),
    I8(i8),
    U16(u16),
    I16(i16),
    U32(u32),
    I32(i32),
    U64(u64),
    I64(i64),
    U128(u128),
    I128(i128),
}

impl<E> From<Discriminant<E>> for DiscriminantValue {
    fn from(v: Discriminant<E>) -> Self {
        use std::hash::{Hash, Hasher};
    
        struct DiscriminantValueExtractor(DiscriminantValue);
        
        impl Hasher for DiscriminantValueExtractor {
            fn write(&mut self, _: &[u8]) {
                unreachable!("don't use this as an actual Hasher ;)");
            }
            fn finish(&self) -> u64 {
                unreachable!("don't use this as an actual Hasher ;)")
            }
            
            fn write_u8(&mut self, i: u8) { self.0 = DiscriminantValue::U8(i); }
            fn write_u16(&mut self, i: u16) { self.0 = DiscriminantValue::U16(i); }
            fn write_u32(&mut self, i: u32) { self.0 = DiscriminantValue::U32(i); }
            fn write_u64(&mut self, i: u64) { self.0 = DiscriminantValue::U64(i); }
            fn write_u128(&mut self, i: u128) { self.0 = DiscriminantValue::U128(i); }
            fn write_usize(&mut self, i: usize) { self.0 = DiscriminantValue::Usize(i); }
            fn write_i8(&mut self, i: i8) { self.0 = DiscriminantValue::I8(i); }
            fn write_i16(&mut self, i: i16) { self.0 = DiscriminantValue::I16(i); }
            fn write_i32(&mut self, i: i32) { self.0 = DiscriminantValue::I32(i); }
            fn write_i64(&mut self, i: i64) { self.0 = DiscriminantValue::I64(i); }
            fn write_i128(&mut self, i: i128) { self.0 = DiscriminantValue::I128(i); }
            fn write_isize(&mut self, i: isize) { self.0 = DiscriminantValue::Isize(i); }
        }
        
        let mut extractor = DiscriminantValueExtractor(DiscriminantValue::Isize(0));
        v.hash(&mut extractor);
        extractor.0
    }
}

//------------------------------

enum Option1<T> {
    None,
    Some(T),
}

enum Option2<T> {
    Some(T),
    None,
}

fn main() {
    use std::mem::discriminant;

    let v1 = discriminant(&Option1::Some(&4));
    let v2 = discriminant(&Option2::Some(&5));
    
    let dv1 = DiscriminantValue::from(v1);
    let dv2 = DiscriminantValue::from(v2);
    
    assert_eq!(dv1, DiscriminantValue::Isize(1));
    assert_eq!(dv2, DiscriminantValue::Isize(0));
}

Apr 10 '24 08:04 kennytm

IMO if the #[repr] appears in the rustdoc (like https://doc.rust-lang.org/core/cmp/enum.Ordering.html), it is part of the API contract and changing it should be breaking. That said, std::cmp::Ordering only gained #[repr(i8)] at 1.57 by https://github.com/rust-lang/rust/pull/895071, and the values -1, 0, 1 are shown in the doc only since 1.75 due to a rustdoc improvement https://github.com/rust-lang/rust/pull/116142.

I don't think rustdoc changes like that are subject to FCP by the team(s) that governs semver (t-lang would be involved I assume?), so I don't think rustdoc can be used as a guideline here. In fact rustdoc printing #[repr] even though they were not considered stable guarantees was an open issue for many years that finally got fixed recently. I would say using rustdoc as the arbiter here is a common misconception but not backed by an actual decision process.

We should make it so that rustdoc only prints things that are semver-stable, but that is not where we are. Discrepancies are bugs, but they are not automatically bugs of the form "rustdoc is right and the semver guarantees need to be changed" (or "the unwritten semver rules should be whatever rustdoc does") -- it is just as likely that rustdoc is wrong.

Apr 10 '24 08:04 RalfJung

@RalfJung Thanks. Perhaps #1105 should contain to pointer to https://doc.rust-lang.org/cargo/reference/semver.html.

Apr 10 '24 09:04 kennytm

I suggest the Lexing section should say how enum#discriminant would be represented as a proc_macro::TokenTree.

Apr 10 '24 18:04 mattheww

No lexing section needed, enum#discriminant is a single "identifier" token following #3101.

Apr 10 '24 20:04 kennytm

rfc3101 says that enum#discriminant is, at present, a lexing-time error, and not any kind of token at all.

The statement that enum#discriminant is treated as an Ident for the purposes of proc-macro input is what I would like to see written in the RFC.

(Clearly it's not an identifier for the purposes of the grammar in general, as that would defeat the point.)

There's also a question of how proc-macros can create such a token. Presumably there would be something parallel to Ident::new_raw().

Apr 10 '24 20:04 mattheww

@mattheww I'd suppose Ident::new("enum#discriminant", span) like all other keywords.

Raw identifier is special because r#something is the same as something but losing keyword magic. Yet enum#discriminant acts just as a weirdly spelled keyword, and regular keywords are instantiated using Ident::new().

(Clearly it's not an identifier for the purposes of the grammar in general, as that would defeat the point.)

In #3101 the token enum#discriminant is a RESERVED_IDENTIFIER (in reality an UnknownPrefix error token). It can easily be changed to produce a regular IDENTIFIER (a keyword is still an IDENTIFIER) if this RFC is accepted.

Apr 10 '24 22:04 kennytm

Then let's write that in the RFC rather than leaving it for people to suppose.

(To be clear, I'm not suggesting any of this will cause trouble, I just want it to be written down.)

Apr 11 '24 07:04 mattheww

I'll note in passing that this is the second language feature that "could've been a library" but for privacy -- the other that I'm thinking of being Jack Wrenn's safe transmute proposal. It seems like a fairly common thing that we want to have access to some information scoped to "those who have privileged visibility".

I'm not against this syntax in particular but I do think this should be a clue for future prioritization -- something like scoped impls, or some way to reify privacy, would be helpful to ensure Rust lives up to its value of being Extensible.

Apr 11 '24 11:04 nikomatsakis

something like scoped impls, or some way to reify privacy

We do have restrictions (approved RFC 3323, tracking issue) which we should be able to leverage in some way or another once they're implemented (rust-lang/rust#106074).

Apr 11 '24 11:04 fmease

Saw this on TWIR and was wondering if this should be done together with having a “discriminant type” as a “secret enum” declared for all enums that is just the tag and no body, then perhaps a special trait can be used to retrieve the type or the value(FooEnum::Discriminant?).

Apr 11 '24 12:04 Ciel-MC

that special trait already exists https://doc.rust-lang.org/std/marker/trait.DiscriminantKind.html but I think it is perma-unstable for now

Apr 11 '24 14:04 kennytm

We discussed this in the lang design meeting on 2024-07-10. Some takeaways:

One is that we decided to go with tag rather than discriminant for this and other language work. We were persuaded by its conciseness. While the word "discriminant" has been used in documentation and in the standard library, this is the first time we'd be adding a language feature that has to name the concept. Given that the existing mem::discriminant API is kind of unfortunate, and unloved in enough other ways, we didn't feel we needed to put much weight on that as precedent. We also noted that much of the language of RFC 2195 had also preferred tag.

Two is that we all seemed happy with some kind of postfix syntax here as opposed to e.g. a special macro (like offset_of!). One reason for this is our expectation of using this kind of syntax for more things and pushing further in this direction. (Some of us would probably be unhappy with this choice if this were the only thing for which we ever used this kind of syntax.)

Three is that we didn't feel strongly that namespacing this with enum was necessary. We discussed how that may be more namespacing than needed, and that seemed the general mood. It seems unlikely we're going to collide identifiers here, and we already know due to type checking that we're dealing with an enum after all; we don't need to risk entering the vicinity of Hungarian notation.

Four, we had a good discussion about whether to go with .#tag or .tag. There are good reasons for both. The former can be used more widely, and that seemed to be where we landed as the place to start with this proposal. We discussed how #ident is not a single token (as it wasn't reserved under RFC 3101), but that's OK, as we can just have the compiler treat this sequence of tokens the right way after lexing.

Five, we talked about how to provide this for variant paths in addition to just values. We noted that fieldless variants could be treated as values, so one could say E::Variant.#tag, but that for variants with fields, that wouldn't work, and we may want to provide something like E::Variant::#tag. We talked about whether these should be lowercase, for consistency and using our kind of special privilege as language designers (as with e.g. u8), or whether these should be capitalized to correspond to how they might be viewed as associated constants or associated types (depending on the feature).

Six, we talked about the various ways we might want rustdoc to visualize this, and how that would help to rapidly raise awareness of and familiarity with this feature.

Jul 11 '24 08:07 traviscross

The use of "tag" as apparently a synonym for "discriminant" is unfortunate insofar as "tag" exists as a term in the compiler and it is not equivalent to "discriminant" there. It refers to how the discriminant is encoded in memory. For instance, for an Option<&T>, the "tag" is just the entire value, and the "discriminant" is either 0 or 1 depending on whether the tag is null or not. Also see the compiler glossary. https://github.com/rust-lang/rfcs/pull/2195 was written before we established consistent terminology here; terminology around variant index / discriminant / tag used to be quite messy in the compiler until they got cleaned up (around 2019 if I recall correctly).

Granted, with this largely being internal compiler terminology, it can be changed. But it will certainly be confusing to people that have worked with enums in the compiler in the past. We have also occasionally used this terminology in opsem and other language discussions, to my knowledge.

Jul 11 '24 08:07 RalfJung

Four, we had a good discussion about whether to go with .#tag or .tag.

one big problem with the .#tag syntax is that it isn't usable in the very popular quote! macro since it would try to interpolate a tag variable at that point. (.enum#tag, .tag, or .anything#tag don't have that problem since they don't have a separate # token) e.g. https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=41c8d00dd76a2a296cacaa125334bc92

Jul 11 '24 10:07 programmerjake

@traviscross I think we have concluded from https://github.com/rust-lang/rfcs/pull/3607#discussion_r1561335884 that this RFC is producing the discriminant, not the tag.

#2195 is irrelevant here, it controls the maximum size of the tag which for now is also determine the maximum of discriminant value. So they can be interchangeable in #[repr].

But for this RFC the ._#thing must be a concrete value, which the "tag" (in-memory value) and "discriminant" (0, 1, 2, ...) very often have distinct values after layout optimization.

Jul 11 '24 10:07 kennytm

though, just .tag by itself may also conflict with a reasonable extension of enum pattern types, where you can directly access fields of single-variant pattern types without needing to match, e.g.:

pub enum MyEnum {
    A {
        tag: u8,
        b: i32,
    },
    B,
}

pub fn f(v: MyEnum is MyEnum::A { .. }) {
    // direct field access, since we statically know which variant we have
    println!("{}, {}", v.tag, v.b);
}

Jul 11 '24 10:07 programmerjake

though, just .tag by itself may also conflict with a reasonable extension of enum pattern types, where you can directly access fields of single-variant pattern types without needing to match, e.g.:

IMO this is another piece of evidence showing that inventing new syntax for this functionality is a bad idea. This is a fairly niche thing to want to do, and does not deserve spending any of our "weirdness budget" on.

Jul 11 '24 12:07 ijackson

Two is that we all seemed happy with some kind of postfix syntax here as opposed to e.g. a special macro (like offset_of!).

What about a postfix macro? So it would look something like:

Enum::Variant::discriminent!() or maybe Enum.discriminant!(Variant).

See also https://github.com/rust-lang/rfcs/pull/2442

Jul 11 '24 13:07 tmccombs

Four, we had a good discussion about whether to go with .#tag or .tag.

one big problem with the .#tag syntax is that it isn't usable in the very popular quote! macro since it would try to interpolate a tag variable at that point. (.enum#tag, .tag, or .anything#tag don't have that problem since they don't have a separate # token) e.g. https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=41c8d00dd76a2a296cacaa125334bc92

this seems like largely not-a-problem if we finally decide to ship https://github.com/rust-lang/rust/issues/54722 since it uses a different unquoting.

Jul 12 '24 00:07 workingjubilee

@programmerjake I don't think we should have dotted access to enum fields unless those fields are explicitly declared as a common field at the top level of the enum. In which case, we could say that you can't declare a common field named "tag". (Whether we should is another question.)

Aug 06 '24 16:08 joshtriplett

rfcs rfcs copied to clipboard

Add an expression for direct access to an enum's discriminant

rfcs
rfcs copied to clipboard