csharplang
csharplang copied to clipboard
Proposal: "Closed" enum types
Related to https://github.com/dotnet/csharplang/issues/485, it would be helpful to be able to declare an enumerated type that is known to the language to be closed. The syntax is an open question, but as a first proposal I'll suggest using closed
as in https://github.com/dotnet/csharplang/issues/485.
For example, a closed enum
closed enum Bool { False, True }
Could be used like this
int M(Bool b)
{
switch (b)
{
case Bool.False: return 0;
case Bool.True: return 1;
}
}
This code would be considered OK by flow analysis, as the method returns for every possible value of the parameter b
. Similarly a switch expression that handles every possible declared enumeration value of its switch expression of a closed enum
type is considered exhaustive.
To ensure safety:
- The arithmetic operators on
closed enum
types are not defined. For example, there is nooperator Bool +(Bool, int)
. Alternately, we might consider them defined but only usable inunsafe
code. - There is no conversion from some underlying type to the type
Bool
(or only inunsafe
code)
Design meetings
https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-09-26.md#discriminated-unions
If you add [Flags]
to it, is that an error?
There must be a way to convert an integer to the closed enum (unsafe code).
I imagine you could add [Flags]
to it, but it wouldn't have any effect other than possibly generating a warning. That attribute has no language meaning.
Since enum class
is going to be the syntax for discriminated-unions, I vote enum struct
for this. When there is no sub-member, it is practically a closed enum
,
// if you can do this
enum struct Option<T> { Some(T), None }
// there's nothing to stop you from doing this
enum struct Bool { False, True }
I think this is more of a lowering strategy question - whether or not this should be emitted as a proper enum.
If we go with proper enums, we can't manage external values, so I think we will need to emit it as a struct anyways.
@0xd4d,
There must be a way to convert an integer to the closed enum (unsafe code).
Why? I see no reason for closed enums to have any sort of fixed underlying value. If you want to convert, then use a switch expression:
Bool b = i switch {
0 => False,
_ => True
};
@alrz,
enum struct Option<T> { Some(T), None }
is sort of how I envisioned DU's looking (and so agree that enum struct Bool { False, True }
is a good choice for closed enums.
However, isn't there still an issue pattern matching on those as I'd imagined DU pattern matching would be type based, ie I'd write something like:
var x = option switch {
Some v => v,
None => default
};
But with enum struct Option<T> { Some(T), None }
, Some
isn't a subtype of Option
.
I can invent some syntax here:
enum struct ShapeName { Point, Rectangle, Circle }
enum struct Shape
{
Point,
Rectangle(double Width, double Length),
Circle(double Radius)
}
and I'm happy that this is consistent: a closed enum is a DU where the values have no parameters.
And I can imagine how I'd pattern match this stuff:
var area = shape switch {
Point => 0,
Rectangle { Width: var w, Length: var l } => w * l,
Circle { Radius: var r } => Math.PI * r * r
};
But, I've no idea how this lot gets lowered by the compiler. For a struct, Shape
, that contains a Circle
"value", what does shape is Circle { Radius: var r }
mean?
Does it mean that existing enum
s will be updated to be somehow open? Otherwise the terminology is confusing.
@dsaf, existing enums are already open. For:
enum Shape { Point, Rectangle, Shape }
Shape
can take any int
value. Point
, Rectangle
, and Shape
are little more than int constants.
But it would indeed be prudent to talk of them as being open enums to clarify the difference between existing enums and the closed variety.
@DavidArno
isn't there still an issue pattern matching on those as I'd imagined DU pattern matching would be type based
No they wouldn't be types (or members), if it's a struct DU, all parameters get emitted as flat fields into the struct, the rest is compiler magic. See what F# produces for your example as a value DU.
Though I can imagine C# could do a better job utilizing runtime support for FieldOffsetAttribute
if possible (e.g. if all parameters are blittable) - minimizing the struct size.
@0xd4d,
There must be a way to convert an integer to the closed enum (unsafe code).
Why? I see no reason for closed enums to have any sort of fixed underlying value. If you want to convert, then use a switch expression:
Bool b = i switch { 0 => False, _ => True };
That's inefficient unless it's a trivial enum like the above Bool enum. Deserialization should be quick.
It should be as simple as it is today, except the programmer must verify that it's a valid value (or it's undefined behavior)
int value = 1;
if (InvalidValue(value)) throw ...;
unsafe { return (MyEnum)value; }
@0xd4d You can add it yourself, or have a generator to do it.
enum struct Bool {
False, True;
public static explicit operator Bool(int v)
=> v switch { 0 => False, 1 => True, _ => throw InvalidCastEx };
}
I think unlike enums the corresponding "integer tag" is solely an implementation detail for DUs, so I don't think it's a good idea for the compiler to generate that out of the box.
@alrz Your code is the same as @DavidArno's code, a big switch statement which generates a lot of code and causes extra CPU usage at runtime. My solution to use a cast in an unsafe block has the same perf and code size as code we can already use today.
@alrz, "compiler magic" suits me. It's then @gafter's job to solve how to have struct-based DUs (including closed enums) work with a is X b
, rather than mine 👍
@0xd4d, it's very unlikely that the solution to serialization of closed enums and other DUs is to convert them to ints. I guess it all depends on whether closed enums are implemented as a completely different solution to DUs or not. Doing so would be a mistake in my view. So I'd argue that any solution to serializing
enum struct Bool { False, True }
also has to be able to serialize
enum struct Option<T> { Some(T), None }
The only difference with the original proposal would be that False and True in that example wouldn't be compile-time constants so we can't use enum struct
and enum
interchangeably.
That is because an enum
type as it exists today, is just a group of named constants with zero safety guarantees, I'm not sure if it makes sense to try to "fix" that or take the trade-off and favor struct DUs.
why not closed by default, and open as suggested syntax to open them?
Because open is the default today, and we can't change that.
@gafter would your intent to be that adding a new value to a "closed" enum is a breaking change? would this be binary compat breaking change, or just a "switch statement which must pick a value entered unknown territory by not finding a match" exceptional territory. eg: what if you wanted to add trinary logic "not provided by customer" to the bool enum (bad naming example)
@AartBluestoke I believe its true, false and filenotfound.
@AartBluestoke I imagine that adding a new value to a closed enum would be a binary breaking change. Just like b switch { false => 0, true => 1 }
is translated to b == false ? 0 : 1
, when b
is a bool
, I would expect b switch { Bool.False => 0, Bool.True => 1 }
to be translated to b == Bool.False ? 0 : 1
when b
is a Bool
. But this design point could be debated and changed.
I imagine that adding a new value to a closed enum would be a binary breaking change.
How would this be enforced without the C# compiler generating an under the hood default case? And also the user might add a default case himself in which case the compiler must generate code to check the range of values expected have not been modified.
Something like: In library A
public closed enum ObjectAccess
{
Read,
ReadWrite,
Write
}
Consumer of Library A code in assembly B:
switch objectAccess
{
case ObjectAccess.Read:
return new ReadOnlyFooBar(...);
default :
return new ReadWriteFooBar(...);
}
I asume the C# generated default case would actually look like:
switch objectAccess
{
case ObjectAccess.Read:
return new ReadOnlyFooBar(...);
default :
{
if (objectAccess != ObjectAccess.ReadWrite && objectAccess != ObjectAccess.Read)
throw new InvalidOperationException($"Unexpected value: {objectAccess:g}");
return new ReadWriteFooBar(...);
}
}
When library Author adds a new value the the closed enum:
public closed enum ObjectAccess
{
Read,
ReadWrite,
Write,
ReadWriteExclusive
}
This will indeed break binary compatibility, however the problem is now that recompiling B will catch the new value in the default clause without the library author having the chance to have to treat the new value explicitly.
So this feature can only work if the default clause is not allowed for closed enums, which I think would make the feature impractical for any enums with more than a few members.
@popcatalin81
Other languages seem to get by just fine. In e.g. Rust, the default
case doesn't add any extra checks: if there's a default
case, then that will catch all other unhandled values, now and in the future.
Currently, if I have a switch
that I want to be exhaustive, I need to have an explicit case
for every possible enum member and a default
which throws an exception. The downside is that I don't get a compiler warning/error if a new enum member is added, only a runtime exception.
With closed enums, I still need to have a case
for each enum member, but I can avoid having a default
at all, and I can get a compiler warning/error if someone adds a new enum member.
I believe the following spec would match the desired behavior:
The compiler will enforce that: Switching on a closed enum must be exhaustive, A switch on a closed enum must either not have a default case, or have a default case that guarantees an exception. If a default case is not specified the compiler will auto-generate a default path that throws an exception if accessed
No other compiler or runtime changes are needed.
@AartBluestoke
A switch on a closed enum must either not have a default case, or have a default case that guarantees an exception.
That seems unnecessary. What if the closed enum has 20 values but I only care about 2 of them? Why can't I have a default that returns some value instead?
The intent was not to require that switches be exhaustive. We'd still permit a switch expression to be non-exhaustive, with a warning. And I don't think we'd add any requirements for switch statements, other than the fact that we can infer that a default case isn't needed (e.g. for definite assignment) if you have cases for every enumeration constant.
A switch on a closed enum must either not have a default case, or have a default case that guarantees an exception.
That seems unnecessary. What if the closed enum has 20 values but I only care about 2 of them? Why can't I have a default that returns some value instead?
if you are only using 2 out of 20 values, then you effectively have open use of a closed enum; you're already saying i don't care if new values get added to this. If that is true then the enum shouldn't be closed. If it is true, you should handle the remaining values.
Perhaps the contextual keyword "default" becomes "and the rest of the values" within the compiler, with the compiler generating a true default branch for exceptions for if non-standard values get into the enum somehow?
My thoughts when i was writing that was if you add a new value to the closed enum, it has to explode, otherwise, "closing" has no meaning other than telling roslyn "don't generate a warning for this one specific situation", and preventing implicit cast from enum to int. and that doesn't seem like a necessary language feature; feels like a job of an analyzer.
if you are only using 2 out of 20 values, then you effectively have open use of a closed enum; you're already saying i don't care if new values get added to this. If that is true then the enum shouldn't be closed. If it is true, you should handle the remaining values.
This is really common in functional languages, and the reason is simple: in some cases you don't care about all the values, and other cases you do.
For example you might have a number of states a method can return. In some cases you want to deal with them all seperately. In others you only care if it was successful or not.
... if you are only using 2 out of 20 values, then you effectively have open use of a closed enum; you're already saying i don't care if new values get added to this ...
Sorry, @AartBluestoke, but that is so not true. If, within a particular context, I explicitly call out two of the twenty values, then that's because - in that context - I'm only interested in those two values. So I'll have a default handler for the rest. But in a situation where I explicitly list all of them (which is likely to occur elsewhere in the code) then I absolutely want a compiler error is a new value is added. And having it be a binary breaking change at that point too would be the icing on the cake.
So a default clause for a closed enum will mean:
- Don't bother me about exhaustiveness.
No default clause:
- Error for missing cases, not handling all values.
It sounds about right for me as a library consumer. There's an escape hatch I can use.
However, I would still like to opt-out of binary breaking changes, if those are added. It may be important to make certain libraries forward compatible with closed enums through some mechanism or another. Example:
public closed enum Choice
{
First,
Second,
Third
}
string AnalyzeChoice(Choice choice)
{
switch choice
{
case Choice.First:
return "Good";
case Choice.Second:
return "Average";
case Choice.Third:
return "Bad";
missing:
return "Unknown"; // Only when value not part of the enum is passed (like a version of the Enum in a future assembly).
//Exhaustiveness checking is still applied to all cases. Recompilation will give an error.
}
}
@DavidArno
And I can imagine how I'd pattern match this stuff:
var area = shape switch { Point => 0, Rectangle { Width: var w, Length: var l } => w * l, Circle { Radius: var r } => Math.PI * r * r };
But, I've no idea how this lot gets lowered by the compiler. For a struct,
Shape
, that contains aCircle
"value", what doesshape is Circle { Radius: var r }
mean?
This may do the job:
[StructLayout(LayoutKind.Explicit)]
public struct Shape
{
public struct Point {}
public struct Rectangle { public double Width; public double Length; }
public struct Circle { public double Radius; }
private closed enum ShapeType : short { Point, Rectangle, Circle }
[FieldOffset(0)]
private ShapeType shapeType;
[FieldOffset(2)]
private Point point;
[FieldOffset(2)]
private Rectangle rectangle;
[FieldOffset(2)]
private Circle circle;
public bool Deconstruct(out Point p)
{
if (shapeType == ShapeType.Point) { p = point; return true; }
else { p = default; return false; }
}
.....
}
However, it won't work that easily with reference types inside
@alrz @DavidArno enum struct
could conflict with value type discriminated unions, if they become a thing. Would be quite unintuitive to have enum class
and enum struct
be unrelated
@john-h-k,
enum struct
could conflict with value type discriminated unions
If we view closed enums as just another type of DU (where the cases are all parameterless) then there is no conflict. Closed enums are just a subset of DUs.
Closed enums should require a name for the zero value, because one can easily acquire that value without unsafe code.
class C
{
private closed enum Number { One = 1 }
private Number? number;
public override string ToString()
{
switch (this.number.GetValueOrDefault())
{
case Number.One: return "one";
}
}
}
Perhaps closed enums don't need to support '=' constant_expression
in enum_member_declaration
at all.
How would people envision these closed enums getting serialized to JSON (say by STJ) ? My first gut reaction would be as string versions of the declared names, with options on how to customize the transformation (lower, upper, camel, etc)