csharplang icon indicating copy to clipboard operation
csharplang copied to clipboard

Champion "Discriminated Unions"

Open gafter opened this issue 7 years ago • 792 comments

  • [ ] Proposal added
  • [ ] Discussed in LDM
  • [ ] Decision in LDM
  • [ ] Finalized (done, rejected, inactive)
  • [ ] Spec'ed

See

  • https://github.com/dotnet/roslyn/issues/188
  • https://github.com/dotnet/csharplang/issues/75
  • https://github.com/dotnet/csharplang/blob/master/meetings/2017/LDM-2017-01-10.md
  • https://github.com/dotnet/csharplang/issues/485

Design meetings

https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-08-31.md#discriminated-unions https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-09-26.md#discriminated-unions

gafter avatar Feb 15 '17 02:02 gafter

I wouldn't expect progress on this feature to precede records.

gafter avatar Feb 15 '17 02:02 gafter

@gafter,

I've started writing a draft proposal around #75. Would you like me to continue with this, or did you have a proposal planned yourself?

DavidArno avatar Feb 20 '17 12:02 DavidArno

@DavidArno I do not expect to invest any significant effort on this until we make progress on records.

gafter avatar Feb 20 '17 18:02 gafter

@gafter,

OK, that's good to know. I'll carry on with my proposal then, but will take time over it as there's no rush.

DavidArno avatar Feb 20 '17 20:02 DavidArno

/cc @agocke

gafter avatar Feb 21 '17 00:02 gafter

I will be moving my proposal of "closed" type hierarchies from Roslyn to this repo shortly. I also think we should explore "real" discriminated unions and have some thoughts on that, but it's still much more of an exploratory phase for me.

However, I think I'm close to a reasonably complete proposal for closed and an analyzer-based prototype. I'll happily champion this feature, as well.

agocke avatar Feb 21 '17 18:02 agocke

One question that I think we need to ask (and I haven't seen anyone ask elsewhere) is whether the case classes can be used as types.

Allow me to illustrate with the example of the Option type:

public enum class Option<T>
{
	None,
	Some(T Value)
}

Obviously Option<T> is a type (an abstract base class), but are None and Some types? Since, in an OO language like C#, ADTs are probably actually implemented as type hierarchies one might be tempted to answer "yes", but I'm not sure it makes much sense.

public void MyMethod(Some<string> someString) // Is this allowed? It doesn't make much sense
{
	// ...
}

I think of ADTs as functioning more like enums, however they're actually implemented. So using each case as a type doesn't make sense, any more than this makes sense:

public enum Colours
{
	Red, Green, Blue
}

public void MyMethod(Blue colour)
{
	// ...
}

Richiban avatar Apr 26 '17 13:04 Richiban

whether the case classes can be used as types.

I think it shouldn't be the case with enum class but with another "expanded" syntax which could represent more complex type hierarchies like an AST

class SyntaxNode {
  case class Statement { } // implicitly inherit from SyntaxNode, as in, a "case" of the said type
  case class Expression {
     case enum class Literal { Numeric, String }
  }
}

alrz avatar Apr 05 '18 14:04 alrz

I think that feature can be added fairly simply using a custom type, in the same way Tuple<T0, ..., TN> was added.

I maintain a library, OneOf, which adds a OneOf<T0, ..., TN> type which has .Match<TOut>(Func<T0, ..., TOut>, ..., Func<T0, ..., TOut> methods. By using implicit operators to create the OneOf instances from values, the syntax is very terse and comprehensible. Also, the allocations are low because it's a struct and doesn't create intermediate 'builder' objects, unlike some other solutions.

The OneOf<T0, .., TN> type also provides .Switch and .TryGetTX(out TX value, out TRemainder remainer) methods.

Example of using a OneOf as a return value:

public OneOf<User, InvalidName, NameTaken> CreateUser(string username)
{
    if (!IsValid(username)) return new InvalidName();
    var user = _repo.FindByUsername(username);
    if(user != null) return new NameTaken();
    var user = new User(username);
    _repo.Save(user);
    return user;
}

example of Matching

    OneOf<string, ColorName, Color> backgroundColor = "Red"; //uses implicit casts to reduce overhead
    Color c = backgroundColor.Match(
        str => CssHelper.GetColorFromString(str),
        name => new Color(name),
        col => col
   );
    

As new types are added to the OneOf definition, compiler errors are generated wherever the union is Matched or Switched, as the methods are required to have the correct number of lambda parameters.

This can be included in the BCL without language changes, although I'm sure some syntactical sugar could be sprinkled.


this proposal was originally made at https://github.com/dotnet/roslyn/issues/14208 and at https://github.com/dotnet/csharplang/issues/1524 . Sorry!

mcintyre321 avatar May 15 '18 13:05 mcintyre321

@mcintyre321 Your OneOf type can be described as equivalent to the Either type or Choice type such as that found in F#. However, the Either type is not an alternative to discriminated unions, in fact it is built on top of discriminated unions:

type Choice<'a, 'b> = Choice1Of2 of 'a | Choice2Of2 of 'b

Richiban avatar May 15 '18 14:05 Richiban

@mcintyre321

While your library does accomplish providing a single discriminated union it also demonstrates the degree of boilerplate required to do so which is what this proposal seeks to reduce. Your types also don't work with C#'s recursive pattern matching which will make it much more efficient and much more capable to match over such a type:

var backgroundColor = ...;

// no delegate invocation required
Color c = backgroundColor switch {
    string str => CssHelper.GetColorFromString(str),
    ColorName name => new Color(name),
    Color col => col
};

HaloFour avatar May 15 '18 14:05 HaloFour

@Richiban OneOf<T0, ..., TN> has up up to 33 parameters, so is more useful as a general return object than Either or Choice.

@HaloFour I agree it would be good to have switch and match support built in to the language,. butI would have thought that the delegate calls will be JITted. I'm not sure what the boilerplate you refer to is.

mcintyre321 avatar May 15 '18 14:05 mcintyre321

@mcintyre321

I'm not sure what the boilerplate you refer to is.

public enum class OneOf<T1, T2>
{
	First(T1 value),
	Second(T2 value)
}

vs. this*

* Yes, I know that you have all of the arities crammed into one, but the file is too large to link to a specific line.

HaloFour avatar May 15 '18 14:05 HaloFour

@mcintyre321 I don't doubt it's usefulness (or the fact that it's better than Either at those situations).

My point was that discriminated unions are a much more general tool that can also solve the problem that Either solves.

I'm not sure how you would propose to implement the equivalent of this using an Either type?

type FavouriteColour =
    | Red
    | Green
    | Blue
    | Other of string

Richiban avatar May 15 '18 14:05 Richiban

@Richiban The abilities to naming a DU is useful, but you still get a lot of value with anonymous DUs. Is it a show-stopper to not have named unions (initially at least)?

That said, there are some things that can be done.

A OneOf, as implemented, is a struct, but I think that in F# (by default) they are classes. So you could make OneOf a class too*, and have class FavouriteColour : OneOf<Red, Green, Blue, string> { }. One problem with this is that implicit operators aren't inherited, although I think maybe I saw a proposal suggesting that was coming.

Another alternative for naming is to use a Record type e.g. class FavouriteColour(OneOf<Red, Green, Blue, string> Value).

And you can always use an alias: using FavouriteColour = OneOf<Red, Green, Blue, string>; if it's just the code bloat that's a problem (rather than the lack of a named Type ).

I appreciate none of this is quite as nice as the F# approach, but perhaps the language could be extended to fix some of this. E.g. defining a union class FavouriteColour : OneOf<Red | Green | Blue | string> { } could cause the compiler to output the required implicit operators.

TBH I'm happy with any solution where

  • you can do ad-hoc definition of DUs in method signatures and member declarations
  • concrete types from any library can be used
  • exhaustive matching with compile errors after a change in parameters (obv!)

@HaloFour cramming it into one file is along the lines of https://referencesource.microsoft.com/#mscorlib/system/tuple.cs , although I admit OneOf.cs has become somewhat larger!

*There's a class-based OneOfBase in the library, but the name isn't great IMO.

mcintyre321 avatar May 15 '18 15:05 mcintyre321

@mcintyre321

Anonymous union type support would be #399 which is quite a bit different from DUs.

HaloFour avatar May 15 '18 15:05 HaloFour

@mcintyre321

The abilities to naming a DU is useful, but you still get a lot of value with anonymous DUs

The naming is what makes it a discriminated union. Without the names it's just a (type) union (also very useful, in my opinion).

I don't know about the C# language team, but I'm absolutely desperate for a decent discriminated union type in C#. The number of times I'm modelling a domain and I want to say "An instance of this type looks either like this or like that." C# has nothing of the kind and it's really difficult to work around (although some of the new pattern matching features from build might take away some of the pain).

Richiban avatar May 15 '18 16:05 Richiban

If you can mint new Types in a single line of code (which you can with Records) then the utility of a Type Union is very close to that of a Tagged/Disciminated Union. But I get your point, that there is a difference.

mcintyre321 avatar May 15 '18 20:05 mcintyre321

As of C# 7.2, the best way I've found so far to model DU's in C# is to use a class hierarchy that uses 7.2's private protected. If, in a library, you create the base type, Union<T1, T2> with a private protected abstract dummy member, then child classes of that type can only be created in the same assembly. So Case1 and Case2 can be created as sealed types forming a closed hierarchy.

The reason for choosing a class is in anticipation of C# 8's pattern matching features, and the ability to do something like:

TR Foo<T1, T2, TR>(Union<T1, T2> union) 
    => union switch
    {
        Case1 x => Bar<TR>(x),
        Case2 x => Baz<TR>(y),
        _ => default // needed as the expression cannot be exhaustiveness-checked
   };

Of course this still has problems, as the cases of the union cannot be named.

From my experiments, I think unions as structs is likely a non-starter, because of pattern matching. To achieve a "type hierarchy" with structs would need either something like "extension everything", or CLR changes. This is because some way of saying union is int x would be needed. So either a custom is operator would be needed that works with generics too (extension everything) or having DUs be a special type within the CLR so that it understood the relationship between the union and its types. Also, structs suffer from the default problem. What's the default value of a union?

Uses classes seems much easier. "All" the language really needs then is enforcement of the closed hierarchy within the same assembly and extending the NRT rules to make a null union an error via flow analysis.

DavidArno avatar May 16 '18 07:05 DavidArno

[<Struct>]
type FavouriteColour =
    | Red
    | Green
Unchecked.defaultof<FavouriteColour>.Dump(); //IsRed == true

For class DUs, the default is null

mcintyre321 avatar May 16 '18 08:05 mcintyre321

@mcintyre321,

Interesting that F# took that approach. I still think building on the NRT flow analysis is a far better way of doing it.

DavidArno avatar May 16 '18 16:05 DavidArno

@DavidArno

As of C# 7.2, the best way I've found so far to model DU's in C# is to use a class hierarchy that uses 7.2's private protected

Interesting... I've been doing it up until now with an abstract class with either a private or internal constructor, depending.

I almost always use the private one, so that all the 'case' classes are ~internal to~ nested within the abstract base class, e.g.

public abstract class FavouriteColour
{
	private FavouriteColour() {}

	public sealed class Red : FavouriteColour {}
	public sealed class Green : FavouriteColour {}
	public sealed class Blue : FavouriteColour {}

	public sealed class Other : FavouriteColour
	{
		public Other(string value)
		{
			Value = value;
		}

		public string Value { get; }
	}
}

Note that for brevity I have not included all the equality, string methods etc in the above code.

Also note that, in my example above, the classes Red, Green, and Blue are singletons (unit-like) and are one of the reasons I quite like the idea of having a singleton concept in C#, perhaps such as that in https://github.com/dotnet/csharplang/issues/490.

It would be handy to be able to say something like:

public object class Unit {}

followed by:

Unit x = Unit; // Note that `Unit` is both a type and a value, negating the need 
               // for a manually written `Instance` static member

Richiban avatar May 21 '18 20:05 Richiban

@Richiban,

A private constructor does work, but it means all the "case" classes then have to be nested within it. A private protected member will achieve a similar result, whilst allowing those case classes to be put in their own files.

It nice to have the choice now.

DavidArno avatar May 22 '18 08:05 DavidArno

@DavidArno

A private constructor does work, but it means all the "case" classes then have to be nested within it. A private protected member will achieve a similar result, whilst allowing those case classes to be put in their own files.

You can use partial to put them in their own files. You just have to repeat the parent class declaration.

HaloFour avatar May 22 '18 10:05 HaloFour

@DavidArno

A private protected member will achieve a similar result, whilst allowing those case classes to be put in their own files.

Sure, as does an internal constructor. The advantage is that you don't need a 'dummy' member.

Richiban avatar May 22 '18 11:05 Richiban

@HaloFour,

You can use partial to put them in their own files...

Technically, I could do that. But I'd prefer to gouge my own eyes out to be honest.

DavidArno avatar May 22 '18 11:05 DavidArno

@Richiban,

Sure, as does an internal constructor.

Hmm, that's a fair point. I think that in my haste to find a use for private protected, I forgot that having an abstract internal member, or even just an internal constructor, would achieve the same result. Oh well, I can toss private protected back on the scrapheap of useless features once more 😄

DavidArno avatar May 22 '18 12:05 DavidArno

@DavidArno

Technically, I could do that. But I'd prefer to gouge my own eyes out to be honest.

This confuses me. It's like another kind of namespace. More logical organization is bad?

jnm2 avatar May 26 '18 15:05 jnm2

@jnm2, A type should have one, cohesive, responsibility. If you split that responsibility over multiple files, you are weakening cohesion.

For that reason, I dislike the idea of partial classes being used for anything save dealing with auto-generated segments.

DavidArno avatar May 26 '18 19:05 DavidArno

@DavidArno How does nesting implementations within an abstract class cause the abstract class to have more than one responsibility? The fact that types are declared inside it doesn't make the containing type responsible—the responsibilities are completely contained in their own, nested, types.

jnm2 avatar May 26 '18 22:05 jnm2