csharplang
csharplang copied to clipboard
Champion "Discriminated Unions"
- [ ] Proposal added
- [ ] Discussed in LDM
- [ ] Decision in LDM
- [ ] Finalized (done, rejected, inactive)
- [ ] Spec'ed
See
- https://github.com/dotnet/roslyn/issues/188
- https://github.com/dotnet/csharplang/issues/75
- https://github.com/dotnet/csharplang/blob/master/meetings/2017/LDM-2017-01-10.md
- https://github.com/dotnet/csharplang/issues/485
Design meetings
https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-08-31.md#discriminated-unions https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-09-26.md#discriminated-unions https://github.com/dotnet/csharplang/blob/main/meetings/2024/LDM-2024-07-24.md#discriminated-unions
I wouldn't expect progress on this feature to precede records.
@gafter,
I've started writing a draft proposal around #75. Would you like me to continue with this, or did you have a proposal planned yourself?
@DavidArno I do not expect to invest any significant effort on this until we make progress on records.
@gafter,
OK, that's good to know. I'll carry on with my proposal then, but will take time over it as there's no rush.
/cc @agocke
I will be moving my proposal of "closed" type hierarchies from Roslyn to this repo shortly. I also think we should explore "real" discriminated unions and have some thoughts on that, but it's still much more of an exploratory phase for me.
However, I think I'm close to a reasonably complete proposal for closed
and an analyzer-based prototype. I'll happily champion this feature, as well.
One question that I think we need to ask (and I haven't seen anyone ask elsewhere) is whether the case classes can be used as types.
Allow me to illustrate with the example of the Option type:
public enum class Option<T>
{
None,
Some(T Value)
}
Obviously Option<T>
is a type (an abstract base class), but are None
and Some
types? Since, in an OO language like C#, ADTs are probably actually implemented as type hierarchies one might be tempted to answer "yes", but I'm not sure it makes much sense.
public void MyMethod(Some<string> someString) // Is this allowed? It doesn't make much sense
{
// ...
}
I think of ADTs as functioning more like enums, however they're actually implemented. So using each case as a type doesn't make sense, any more than this makes sense:
public enum Colours
{
Red, Green, Blue
}
public void MyMethod(Blue colour)
{
// ...
}
whether the case classes can be used as types.
I think it shouldn't be the case with enum class
but with another "expanded" syntax which could represent more complex type hierarchies like an AST
class SyntaxNode {
case class Statement { } // implicitly inherit from SyntaxNode, as in, a "case" of the said type
case class Expression {
case enum class Literal { Numeric, String }
}
}
I think that feature can be added fairly simply using a custom type, in the same way Tuple<T0, ..., TN>
was added.
I maintain a library, OneOf, which adds a OneOf<T0, ..., TN>
type which has .Match<TOut>(Func<T0, ..., TOut>, ..., Func<T0, ..., TOut>
methods. By using implicit operators to create the OneOf instances from values, the syntax is very terse and comprehensible. Also, the allocations are low because it's a struct and doesn't create intermediate 'builder' objects, unlike some other solutions.
The OneOf<T0, .., TN>
type also provides .Switch
and .TryGetTX(out TX value, out TRemainder remainer)
methods.
Example of using a OneOf as a return value:
public OneOf<User, InvalidName, NameTaken> CreateUser(string username)
{
if (!IsValid(username)) return new InvalidName();
var user = _repo.FindByUsername(username);
if(user != null) return new NameTaken();
var user = new User(username);
_repo.Save(user);
return user;
}
example of Matching
OneOf<string, ColorName, Color> backgroundColor = "Red"; //uses implicit casts to reduce overhead
Color c = backgroundColor.Match(
str => CssHelper.GetColorFromString(str),
name => new Color(name),
col => col
);
As new types are added to the OneOf definition, compiler errors are generated wherever the union is Match
ed or Switch
ed, as the methods are required to have the correct number of lambda parameters.
This can be included in the BCL without language changes, although I'm sure some syntactical sugar could be sprinkled.
this proposal was originally made at https://github.com/dotnet/roslyn/issues/14208 and at https://github.com/dotnet/csharplang/issues/1524 . Sorry!
@mcintyre321 Your OneOf
type can be described as equivalent to the Either
type or Choice
type such as that found in F#. However, the Either
type is not an alternative to discriminated unions, in fact it is built on top of discriminated unions:
type Choice<'a, 'b> = Choice1Of2 of 'a | Choice2Of2 of 'b
@mcintyre321
While your library does accomplish providing a single discriminated union it also demonstrates the degree of boilerplate required to do so which is what this proposal seeks to reduce. Your types also don't work with C#'s recursive pattern matching which will make it much more efficient and much more capable to match over such a type:
var backgroundColor = ...;
// no delegate invocation required
Color c = backgroundColor switch {
string str => CssHelper.GetColorFromString(str),
ColorName name => new Color(name),
Color col => col
};
@Richiban OneOf<T0, ..., TN> has up up to 33 parameters, so is more useful as a general return object than Either or Choice.
@HaloFour I agree it would be good to have switch
and match
support built in to the language,. butI would have thought that the delegate calls will be JITted. I'm not sure what the boilerplate you refer to is.
@mcintyre321
I'm not sure what the boilerplate you refer to is.
public enum class OneOf<T1, T2>
{
First(T1 value),
Second(T2 value)
}
vs. this*
* Yes, I know that you have all of the arities crammed into one, but the file is too large to link to a specific line.
@mcintyre321 I don't doubt it's usefulness (or the fact that it's better than Either
at those situations).
My point was that discriminated unions are a much more general tool that can also solve the problem that Either
solves.
I'm not sure how you would propose to implement the equivalent of this using an Either
type?
type FavouriteColour =
| Red
| Green
| Blue
| Other of string
@Richiban The abilities to naming a DU is useful, but you still get a lot of value with anonymous DUs. Is it a show-stopper to not have named unions (initially at least)?
That said, there are some things that can be done.
A OneOf
, as implemented, is a struct, but I think that in F# (by default) they are classes. So you could make OneOf a class too*, and have class FavouriteColour : OneOf<Red, Green, Blue, string> { }
. One problem with this is that implicit operators aren't inherited, although I think maybe I saw a proposal suggesting that was coming.
Another alternative for naming is to use a Record type e.g. class FavouriteColour(OneOf<Red, Green, Blue, string> Value)
.
And you can always use an alias: using FavouriteColour = OneOf<Red, Green, Blue, string>;
if it's just the code bloat that's a problem (rather than the lack of a named Type ).
I appreciate none of this is quite as nice as the F# approach, but perhaps the language could be extended to fix some of this. E.g. defining a union class FavouriteColour : OneOf<Red | Green | Blue | string> { }
could cause the compiler to output the required implicit operators.
TBH I'm happy with any solution where
- you can do ad-hoc definition of DUs in method signatures and member declarations
- concrete types from any library can be used
- exhaustive matching with compile errors after a change in parameters (obv!)
@HaloFour cramming it into one file is along the lines of https://referencesource.microsoft.com/#mscorlib/system/tuple.cs , although I admit OneOf.cs has become somewhat larger!
*There's a class-based OneOfBase in the library, but the name isn't great IMO.
@mcintyre321
Anonymous union type support would be #399 which is quite a bit different from DUs.
@mcintyre321
The abilities to naming a DU is useful, but you still get a lot of value with anonymous DUs
The naming is what makes it a discriminated union. Without the names it's just a (type) union (also very useful, in my opinion).
I don't know about the C# language team, but I'm absolutely desperate for a decent discriminated union type in C#. The number of times I'm modelling a domain and I want to say "An instance of this type looks either like this or like that." C# has nothing of the kind and it's really difficult to work around (although some of the new pattern matching features from build might take away some of the pain).
If you can mint new Types in a single line of code (which you can with Records) then the utility of a Type Union is very close to that of a Tagged/Disciminated Union. But I get your point, that there is a difference.
As of C# 7.2, the best way I've found so far to model DU's in C# is to use a class hierarchy that uses 7.2's private protected
. If, in a library, you create the base type, Union<T1, T2>
with a private protected abstract
dummy member, then child classes of that type can only be created in the same assembly. So Case1
and Case2
can be created as sealed types forming a closed hierarchy.
The reason for choosing a class is in anticipation of C# 8's pattern matching features, and the ability to do something like:
TR Foo<T1, T2, TR>(Union<T1, T2> union)
=> union switch
{
Case1 x => Bar<TR>(x),
Case2 x => Baz<TR>(y),
_ => default // needed as the expression cannot be exhaustiveness-checked
};
Of course this still has problems, as the cases of the union cannot be named.
From my experiments, I think unions as structs is likely a non-starter, because of pattern matching. To achieve a "type hierarchy" with structs would need either something like "extension everything", or CLR changes. This is because some way of saying union is int x
would be needed. So either a custom is
operator would be needed that works with generics too (extension everything) or having DUs be a special type within the CLR so that it understood the relationship between the union and its types. Also, structs suffer from the default
problem. What's the default value of a union?
Uses classes seems much easier. "All" the language really needs then is enforcement of the closed hierarchy within the same assembly and extending the NRT rules to make a null union an error via flow analysis.
[<Struct>]
type FavouriteColour =
| Red
| Green
Unchecked.defaultof<FavouriteColour>.Dump(); //IsRed == true
For class DUs, the default is null
@mcintyre321,
Interesting that F# took that approach. I still think building on the NRT flow analysis is a far better way of doing it.
@DavidArno
As of C# 7.2, the best way I've found so far to model DU's in C# is to use a class hierarchy that uses 7.2's private protected
Interesting... I've been doing it up until now with an abstract class with either a private
or internal
constructor, depending.
I almost always use the private
one, so that all the 'case' classes are ~internal to~ nested within the abstract base class, e.g.
public abstract class FavouriteColour
{
private FavouriteColour() {}
public sealed class Red : FavouriteColour {}
public sealed class Green : FavouriteColour {}
public sealed class Blue : FavouriteColour {}
public sealed class Other : FavouriteColour
{
public Other(string value)
{
Value = value;
}
public string Value { get; }
}
}
Note that for brevity I have not included all the equality, string methods etc in the above code.
Also note that, in my example above, the classes Red
, Green
, and Blue
are singletons (unit-like) and are one of the reasons I quite like the idea of having a singleton concept in C#, perhaps such as that in https://github.com/dotnet/csharplang/issues/490.
It would be handy to be able to say something like:
public object class Unit {}
followed by:
Unit x = Unit; // Note that `Unit` is both a type and a value, negating the need
// for a manually written `Instance` static member
@Richiban,
A private constructor does work, but it means all the "case" classes then have to be nested within it. A private protected
member will achieve a similar result, whilst allowing those case classes to be put in their own files.
It nice to have the choice now.
@DavidArno
A private constructor does work, but it means all the "case" classes then have to be nested within it. A
private protected
member will achieve a similar result, whilst allowing those case classes to be put in their own files.
You can use partial
to put them in their own files. You just have to repeat the parent class declaration.
@DavidArno
A private protected member will achieve a similar result, whilst allowing those case classes to be put in their own files.
Sure, as does an internal
constructor. The advantage is that you don't need a 'dummy' member.
@HaloFour,
You can use
partial
to put them in their own files...
Technically, I could do that. But I'd prefer to gouge my own eyes out to be honest.
@Richiban,
Sure, as does an
internal
constructor.
Hmm, that's a fair point. I think that in my haste to find a use for private protected
, I forgot that having an abstract internal
member, or even just an internal
constructor, would achieve the same result. Oh well, I can toss private protected
back on the scrapheap of useless features once more 😄
@DavidArno
Technically, I could do that. But I'd prefer to gouge my own eyes out to be honest.
This confuses me. It's like another kind of namespace. More logical organization is bad?
@jnm2, A type should have one, cohesive, responsibility. If you split that responsibility over multiple files, you are weakening cohesion.
For that reason, I dislike the idea of partial classes being used for anything save dealing with auto-generated segments.
@DavidArno How does nesting implementations within an abstract class cause the abstract class to have more than one responsibility? The fact that types are declared inside it doesn't make the containing type responsible—the responsibilities are completely contained in their own, nested, types.