csharplang icon indicating copy to clipboard operation
csharplang copied to clipboard

Proposal: "Closed" enum types

Open gafter opened this issue 4 years ago • 64 comments

Related to https://github.com/dotnet/csharplang/issues/485, it would be helpful to be able to declare an enumerated type that is known to the language to be closed. The syntax is an open question, but as a first proposal I'll suggest using closed as in https://github.com/dotnet/csharplang/issues/485.

For example, a closed enum

closed enum Bool { False, True }

Could be used like this

int M(Bool b)
{
    switch (b)
    {
        case Bool.False: return 0;
        case Bool.True: return 1;
    }
}

This code would be considered OK by flow analysis, as the method returns for every possible value of the parameter b. Similarly a switch expression that handles every possible declared enumeration value of its switch expression of a closed enum type is considered exhaustive.

To ensure safety:

  • The arithmetic operators on closed enum types are not defined. For example, there is no operator Bool +(Bool, int). Alternately, we might consider them defined but only usable in unsafe code.
  • There is no conversion from some underlying type to the type Bool (or only in unsafe code)

Design meetings

https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-09-26.md#discriminated-unions

gafter avatar Feb 10 '20 20:02 gafter

If you add [Flags] to it, is that an error?

There must be a way to convert an integer to the closed enum (unsafe code).

0xd4d avatar Feb 10 '20 22:02 0xd4d

I imagine you could add [Flags] to it, but it wouldn't have any effect other than possibly generating a warning. That attribute has no language meaning.

gafter avatar Feb 10 '20 23:02 gafter

Since enum class is going to be the syntax for discriminated-unions, I vote enum struct for this. When there is no sub-member, it is practically a closed enum,

// if you can do this
enum struct Option<T> { Some(T), None }
// there's nothing to stop you from doing this
enum struct Bool { False, True }

I think this is more of a lowering strategy question - whether or not this should be emitted as a proper enum.

If we go with proper enums, we can't manage external values, so I think we will need to emit it as a struct anyways.

(some examples from F#)

alrz avatar Feb 11 '20 07:02 alrz

@0xd4d,

There must be a way to convert an integer to the closed enum (unsafe code).

Why? I see no reason for closed enums to have any sort of fixed underlying value. If you want to convert, then use a switch expression:

Bool b = i switch {
    0 => False,
    _ => True
};

@alrz, enum struct Option<T> { Some(T), None } is sort of how I envisioned DU's looking (and so agree that enum struct Bool { False, True } is a good choice for closed enums.

However, isn't there still an issue pattern matching on those as I'd imagined DU pattern matching would be type based, ie I'd write something like:

var x = option switch {
    Some v => v,
    None => default
};

But with enum struct Option<T> { Some(T), None }, Some isn't a subtype of Option.

I can invent some syntax here:

enum struct ShapeName { Point, Rectangle, Circle }

enum struct Shape 
{
    Point,  
    Rectangle(double Width, double Length),
    Circle(double Radius)
}

and I'm happy that this is consistent: a closed enum is a DU where the values have no parameters.

And I can imagine how I'd pattern match this stuff:

var area = shape switch {
    Point => 0,
    Rectangle { Width: var w, Length: var l } => w * l,
    Circle { Radius: var r } => Math.PI * r * r
};

But, I've no idea how this lot gets lowered by the compiler. For a struct, Shape, that contains a Circle "value", what does shape is Circle { Radius: var r } mean?

DavidArno avatar Feb 11 '20 09:02 DavidArno

Does it mean that existing enums will be updated to be somehow open? Otherwise the terminology is confusing.

dsaf avatar Feb 11 '20 09:02 dsaf

@dsaf, existing enums are already open. For:

enum Shape { Point, Rectangle, Shape }

Shape can take any int value. Point, Rectangle, and Shape are little more than int constants.

But it would indeed be prudent to talk of them as being open enums to clarify the difference between existing enums and the closed variety.

DavidArno avatar Feb 11 '20 09:02 DavidArno

@DavidArno

isn't there still an issue pattern matching on those as I'd imagined DU pattern matching would be type based

No they wouldn't be types (or members), if it's a struct DU, all parameters get emitted as flat fields into the struct, the rest is compiler magic. See what F# produces for your example as a value DU.

Though I can imagine C# could do a better job utilizing runtime support for FieldOffsetAttribute if possible (e.g. if all parameters are blittable) - minimizing the struct size.

alrz avatar Feb 11 '20 09:02 alrz

@0xd4d,

There must be a way to convert an integer to the closed enum (unsafe code).

Why? I see no reason for closed enums to have any sort of fixed underlying value. If you want to convert, then use a switch expression:

Bool b = i switch {
    0 => False,
    _ => True
};

That's inefficient unless it's a trivial enum like the above Bool enum. Deserialization should be quick.

It should be as simple as it is today, except the programmer must verify that it's a valid value (or it's undefined behavior)

int value = 1;
if (InvalidValue(value)) throw ...;
unsafe { return (MyEnum)value; }

0xd4d avatar Feb 11 '20 09:02 0xd4d

@0xd4d You can add it yourself, or have a generator to do it.

enum struct Bool { 
  False, True;
  public static explicit operator Bool(int v)
    => v switch { 0 => False, 1 => True, _ => throw InvalidCastEx };
}

I think unlike enums the corresponding "integer tag" is solely an implementation detail for DUs, so I don't think it's a good idea for the compiler to generate that out of the box.

alrz avatar Feb 11 '20 10:02 alrz

@alrz Your code is the same as @DavidArno's code, a big switch statement which generates a lot of code and causes extra CPU usage at runtime. My solution to use a cast in an unsafe block has the same perf and code size as code we can already use today.

0xd4d avatar Feb 11 '20 10:02 0xd4d

@alrz, "compiler magic" suits me. It's then @gafter's job to solve how to have struct-based DUs (including closed enums) work with a is X b, rather than mine 👍

@0xd4d, it's very unlikely that the solution to serialization of closed enums and other DUs is to convert them to ints. I guess it all depends on whether closed enums are implemented as a completely different solution to DUs or not. Doing so would be a mistake in my view. So I'd argue that any solution to serializing

enum struct Bool { False, True }

also has to be able to serialize

enum struct Option<T> { Some(T), None }

DavidArno avatar Feb 11 '20 10:02 DavidArno

The only difference with the original proposal would be that False and True in that example wouldn't be compile-time constants so we can't use enum struct and enum interchangeably.

That is because an enum type as it exists today, is just a group of named constants with zero safety guarantees, I'm not sure if it makes sense to try to "fix" that or take the trade-off and favor struct DUs.

alrz avatar Feb 11 '20 18:02 alrz

why not closed by default, and open as suggested syntax to open them?

Because open is the default today, and we can't change that.

333fred avatar Feb 11 '20 19:02 333fred

@gafter would your intent to be that adding a new value to a "closed" enum is a breaking change? would this be binary compat breaking change, or just a "switch statement which must pick a value entered unknown territory by not finding a match" exceptional territory. eg: what if you wanted to add trinary logic "not provided by customer" to the bool enum (bad naming example)

AartBluestoke avatar Feb 11 '20 22:02 AartBluestoke

@AartBluestoke I believe its true, false and filenotfound.

DanFTRX avatar Feb 11 '20 22:02 DanFTRX

@AartBluestoke I imagine that adding a new value to a closed enum would be a binary breaking change. Just like b switch { false => 0, true => 1 } is translated to b == false ? 0 : 1, when b is a bool, I would expect b switch { Bool.False => 0, Bool.True => 1 } to be translated to b == Bool.False ? 0 : 1 when b is a Bool. But this design point could be debated and changed.

gafter avatar Feb 11 '20 22:02 gafter

I imagine that adding a new value to a closed enum would be a binary breaking change.

How would this be enforced without the C# compiler generating an under the hood default case? And also the user might add a default case himself in which case the compiler must generate code to check the range of values expected have not been modified.

Something like: In library A

public closed enum ObjectAccess
{
     Read,
     ReadWrite,
     Write
}

Consumer of Library A code in assembly B:


switch objectAccess
{
    case ObjectAccess.Read:
           return new ReadOnlyFooBar(...);
    default :
           return new ReadWriteFooBar(...);
}

I asume the C# generated default case would actually look like:

switch objectAccess
{
    case ObjectAccess.Read:
           return new ReadOnlyFooBar(...);
    default :
    {
          if (objectAccess != ObjectAccess.ReadWrite && objectAccess != ObjectAccess.Read)
                throw new InvalidOperationException($"Unexpected value: {objectAccess:g}");
          return new ReadWriteFooBar(...);
    }
}

When library Author adds a new value the the closed enum:

public closed enum ObjectAccess
{
     Read,
     ReadWrite,
     Write,
     ReadWriteExclusive
}

This will indeed break binary compatibility, however the problem is now that recompiling B will catch the new value in the default clause without the library author having the chance to have to treat the new value explicitly.

So this feature can only work if the default clause is not allowed for closed enums, which I think would make the feature impractical for any enums with more than a few members.

popcatalin81 avatar Feb 12 '20 12:02 popcatalin81

@popcatalin81

Other languages seem to get by just fine. In e.g. Rust, the default case doesn't add any extra checks: if there's a default case, then that will catch all other unhandled values, now and in the future.

Currently, if I have a switch that I want to be exhaustive, I need to have an explicit case for every possible enum member and a default which throws an exception. The downside is that I don't get a compiler warning/error if a new enum member is added, only a runtime exception.

With closed enums, I still need to have a case for each enum member, but I can avoid having a default at all, and I can get a compiler warning/error if someone adds a new enum member.

canton7 avatar Feb 12 '20 12:02 canton7

I believe the following spec would match the desired behavior:

The compiler will enforce that: Switching on a closed enum must be exhaustive, A switch on a closed enum must either not have a default case, or have a default case that guarantees an exception. If a default case is not specified the compiler will auto-generate a default path that throws an exception if accessed

No other compiler or runtime changes are needed.

AartBluestoke avatar Feb 12 '20 23:02 AartBluestoke

@AartBluestoke

A switch on a closed enum must either not have a default case, or have a default case that guarantees an exception.

That seems unnecessary. What if the closed enum has 20 values but I only care about 2 of them? Why can't I have a default that returns some value instead?

HaloFour avatar Feb 12 '20 23:02 HaloFour

The intent was not to require that switches be exhaustive. We'd still permit a switch expression to be non-exhaustive, with a warning. And I don't think we'd add any requirements for switch statements, other than the fact that we can infer that a default case isn't needed (e.g. for definite assignment) if you have cases for every enumeration constant.

gafter avatar Feb 13 '20 01:02 gafter

A switch on a closed enum must either not have a default case, or have a default case that guarantees an exception.

That seems unnecessary. What if the closed enum has 20 values but I only care about 2 of them? Why can't I have a default that returns some value instead?

if you are only using 2 out of 20 values, then you effectively have open use of a closed enum; you're already saying i don't care if new values get added to this. If that is true then the enum shouldn't be closed. If it is true, you should handle the remaining values.

Perhaps the contextual keyword "default" becomes "and the rest of the values" within the compiler, with the compiler generating a true default branch for exceptions for if non-standard values get into the enum somehow?

My thoughts when i was writing that was if you add a new value to the closed enum, it has to explode, otherwise, "closing" has no meaning other than telling roslyn "don't generate a warning for this one specific situation", and preventing implicit cast from enum to int. and that doesn't seem like a necessary language feature; feels like a job of an analyzer.

AartBluestoke avatar Feb 13 '20 06:02 AartBluestoke

if you are only using 2 out of 20 values, then you effectively have open use of a closed enum; you're already saying i don't care if new values get added to this. If that is true then the enum shouldn't be closed. If it is true, you should handle the remaining values.

This is really common in functional languages, and the reason is simple: in some cases you don't care about all the values, and other cases you do.

For example you might have a number of states a method can return. In some cases you want to deal with them all seperately. In others you only care if it was successful or not.

YairHalberstadt avatar Feb 13 '20 06:02 YairHalberstadt

... if you are only using 2 out of 20 values, then you effectively have open use of a closed enum; you're already saying i don't care if new values get added to this ...

Sorry, @AartBluestoke, but that is so not true. If, within a particular context, I explicitly call out two of the twenty values, then that's because - in that context - I'm only interested in those two values. So I'll have a default handler for the rest. But in a situation where I explicitly list all of them (which is likely to occur elsewhere in the code) then I absolutely want a compiler error is a new value is added. And having it be a binary breaking change at that point too would be the icing on the cake.

DavidArno avatar Feb 13 '20 16:02 DavidArno

So a default clause for a closed enum will mean:

  • Don't bother me about exhaustiveness.

No default clause:

  • Error for missing cases, not handling all values.

It sounds about right for me as a library consumer. There's an escape hatch I can use.

However, I would still like to opt-out of binary breaking changes, if those are added. It may be important to make certain libraries forward compatible with closed enums through some mechanism or another. Example:

public closed enum Choice
{
        First,
        Second,
        Third
}

string AnalyzeChoice(Choice choice)
{
       switch choice
      {
            case Choice.First:
                  return "Good";
            case Choice.Second:
                  return "Average";
            case Choice.Third:
                  return "Bad";
            missing:
                return  "Unknown"; // Only when value not part of the enum is passed (like a version of the Enum in a future assembly). 
                //Exhaustiveness checking is still applied to all cases. Recompilation will give an error.
      }
}

popcatalin81 avatar Feb 13 '20 21:02 popcatalin81

@DavidArno

And I can imagine how I'd pattern match this stuff:

var area = shape switch {
    Point => 0,
    Rectangle { Width: var w, Length: var l } => w * l,
    Circle { Radius: var r } => Math.PI * r * r
};

But, I've no idea how this lot gets lowered by the compiler. For a struct, Shape, that contains a Circle "value", what does shape is Circle { Radius: var r } mean?

This may do the job:

[StructLayout(LayoutKind.Explicit)]
public struct Shape
{
  public struct Point {}
  public struct Rectangle { public double Width; public double Length; }
  public struct Circle { public double Radius; }
  private closed enum ShapeType : short { Point, Rectangle, Circle }
  [FieldOffset(0)]
  private ShapeType shapeType;
  [FieldOffset(2)]
  private Point point;
  [FieldOffset(2)]
  private Rectangle rectangle;
  [FieldOffset(2)]
  private Circle circle;
  public bool Deconstruct(out Point p)
  {
    if (shapeType == ShapeType.Point) { p = point; return true; }
    else { p = default; return false; }
  }
.....
}

However, it won't work that easily with reference types inside

declard avatar Mar 03 '20 18:03 declard

@alrz @DavidArno enum struct could conflict with value type discriminated unions, if they become a thing. Would be quite unintuitive to have enum class and enum struct be unrelated

john-h-k avatar Jun 25 '20 22:06 john-h-k

@john-h-k,

enum struct could conflict with value type discriminated unions

If we view closed enums as just another type of DU (where the cases are all parameterless) then there is no conflict. Closed enums are just a subset of DUs.

DavidArno avatar Jun 26 '20 12:06 DavidArno

Closed enums should require a name for the zero value, because one can easily acquire that value without unsafe code.

class C
{
    private closed enum Number { One = 1 }

    private Number? number;

    public override string ToString()
    {
        switch (this.number.GetValueOrDefault())
        {
            case Number.One: return "one";
        }
    }
}

Perhaps closed enums don't need to support '=' constant_expression in enum_member_declaration at all.

KalleOlaviNiemitalo avatar Jun 29 '20 15:06 KalleOlaviNiemitalo

How would people envision these closed enums getting serialized to JSON (say by STJ) ? My first gut reaction would be as string versions of the declared names, with options on how to customize the transformation (lower, upper, camel, etc)

ericsampson avatar Nov 19 '20 17:11 ericsampson