language icon indicating copy to clipboard operation
language copied to clipboard

Sum/union types and type matching

Open GregorySech opened this issue 5 years ago • 127 comments

I'd like to import some features from functional languages like OCaML and F#. The main feature I'm proposing is the possibility of using typedef to define sum types. They can be thought of as type enumerations. This will, in some cases, reduce the need for dynamic making for sounder typing. For example json.decode would no longer return dynamic but JSON a sum type, something like:

typedef JSON is Map<String, JSON> | List<JSON> | Literal;

Sum types could power a new behaviour of switch case that resolves the type based on the typed enumerated. Kind of like assert and if does with contemporary is syntax.

typedef T is A | B | C;
...
void foo(T arg) {
  switch(arg) {
    case A: // here arg is casted to A
      break;
    case B: // here arg is casted to B
      break;
    case C: // here arg is casted to C
      break;
  }
}

A better syntax for this type of switch case might be of order, maybe something like #27.

typedef T is A | B | C;
...
void foo(T arg) {
  switch(arg) {
    case A -> ... // here arg is A
    case B -> ... // here arg is B
    case C -> ... // here arg is C
  }
}

This would be a powered down version of OCaML and F# match <arg> with as I've not included a proposition for type deconstruction, which would probably require tuples (or more in general product types) as discussed in #68.

GregorySech avatar Nov 05 '18 06:11 GregorySech

First note that it should be possible to categorize this issue as a request or a feature. The request would articulate a non-trivial and practically relevant shortcoming in terms of software engineering that would justify considerations about adding new features to Dart. A feature description is a response to a request, that is, every feature under consideration should be grounded in a need for the feature in terms of an underlying software engineering problem, it should never just be "I want feature X just because that's a cool concept to have in a language". It may well be a non-trivial exercise to come up with such a request, and it may not necessarily have just one feature as its "corresponding solution", but that's a good thing, not a bug. ;-)

This issue is clearly a feature issue, specifically promoting a language construct (or a family of closely related ones). So we need a request issue which describes a software engineering problem more broadly, such that some notion of sum types would be an appropriate response to that request.

That said, requests for sum types are of course not a new thing. A couple of pointers: https://github.com/dart-lang/sdk/issues/4938 has been the main location for discussing union types, and the feature discussed in this issue may or may not be covered under that umbrella.

In particular, this proposal may or may not assume tags. If tags are assumed, typedef JSON would be similar to a class hierarchy like this:

class Json {}
class JsonMap implements Json {
  Map<String, Json> value;
}
class JsonList implements Json {
  List<Json> value;
}
class JsonLiteral implements Json {
  ... // Whatever is needed to be such a literal.
}

This would provide run-time support for discriminating all cases, because anything of type "Map<String, JSON>" would actually be wrapped in an object of type JsonMap, and a simple is JsonMap test would reveal whether we're looking at such an object. In a functional language you'd declare all cases in a single construct, and it would thus be easy to detect at compile time whether any given deconstruction is exhaustive. With an OO class hierarchy you'd need a concept like a 'sealed' class hierarchy in order to get a similar statically known exhaustiveness guarantee. (OCaml supports extensible variant types, and they require an explicit 'default' case in every pattern match, so we got the same trade-off there, albeit with the sealed design as the default.) In any case, tagged unions is the easy way to do this, and it may be somewhat costly at run time, because we have to create and navigate a lot of "wrapper" objects.

However, if we assume untagged unions then we cannot expect run-time support for O(1) discrimination. For instance, a recursive untagged union type could be defined like this (assuming that Dart is extended to support recursive union types):

typedef NestedIntList = int | List<NestedIntList>;

If we're simply considering the type NestedIntList to be the (infinite) union of int and List<int> and List<List<int>> and so on, then it would be possible for a variable of type NestedIntList to refer to an instance of int, and it could also refer to an instance of List<int>, etc. There would not be any wrapper objects, and we would have no support to tell in time O(1) that any given object (say, a List<int>) is in fact of type NestedIntList. This means that the expression e is NestedIntList would need to perform an unbounded number of type checks, and it would in principle be unable to stop at any finite level (if it's looking at a List<List<.....<String>.....>> it wouldn't be able to say "no" before it hits the String, which could be nested any number of levels.)

So we need to clarify whether sum types are intended to be similar to SML algebraic data types (that is, tagged unions), or similar to TypeScript union types (untagged unions).

We should note that TypeScript does not support dynamic type tests (and I do not believe the user-defined workaround is a relevant replacement in Dart). Dart will remain sound in the sense that the obvious heap invariant ("a variable of type T refers to an entity of type T") is maintained at all times, based on a combination of static and dynamic checks. So we don't have the option to ignore that certain type tests / type checks are expensive.

eernstg avatar Nov 05 '18 13:11 eernstg

Sorry for the missing tag, it seems I'm unable to edit them, I'll propose them in the title while I'm looking for documentation on how to do it.

Thank you for the extensive explanation, I didn't dive deep enough into the sdk issues and I apologize.

To answer the specification question: I was not aware of union types in TypeScript as I've not used it yet so I intended them to be as similar as possible to SML data types however I was ignoring the implementation details of such feature. I'll start studying the mentioned concept in the next days.

GregorySech avatar Nov 05 '18 16:11 GregorySech

@GregorySech, first: Thanks for your input! I should have said that. Second: I guess there's some permission management in place which restricts the labeling; I'm sure those labels will be adjusted as needed, so don't worry. So the main point is simply that it would be really useful in the discussion about a topic like this to have a characterization of the underlying software engineering problem (without any reference to any particular language construct that might solve the problem), and it would be helpful if you could capture that underlying software engineering challenge for which such a feature as 'sum types' would be a relevant solution.

eernstg avatar Nov 05 '18 16:11 eernstg

@eernstg

if we assume untagged unions ... There would not be any wrapper objects

Would this still be true if Dart adds unboxed types in the future? For example, imagine in the future non-nullable int is directly represented by 64-bit integer and its value is inlined into variables and fields. Now, what happens when you define a variable of type int | String?

yjbanov avatar Nov 05 '18 17:11 yjbanov

FutureOr<T> is a real use case that is hardcoded in Dart today and would be solved neatly when Dart introduces union types. I agree they don't have to match all the specifics of other languages.

mdebbar avatar Nov 06 '18 02:11 mdebbar

.. [no] wrapper objects Would this still be true if Dart adds unboxed types[?]

If Dart is extended to support a notion of unboxed types I would expect such instances to be compatible enough with the rest of Dart to allow a reference of type Object to refer to them, and that would require support for some notion of boxing.

You may or may not want to say that a boxed entity of type T is a wrapper object containing an unboxed entity of type T, but we're still talking about the same type (with different representations at run-time).

With a wrapper which is used as the representation of a tagged union type there is a transition from one type to another type when the wrapper is added. An example in SML would be the following:

datatype IntString = INT of int | STRING of string;
fun f (INT i) = "An integer"
  | f (STRING s) = s;
val v1 = f (INT 42);
val v2 = f (STRING "24");

Here, INT 42 is of type IntString, but a plain 42 is of type int, so there's a difference between "just being an int" and "being an int which is one of the cases in the union of int and string", and that difference amounts to an entire wrapper object. It's a compile-time error to call f with an argument of type int or string, they must be wrapped first, hence f (INT 42). You need to create that wrapper object explicitly (OK, we could try to infer that, but that's another tarpit ;-), and in return you can test for it at run time (so f knows which case to run when you call it). An extension of Dart with tagged unions would presumably have a lot of the same properties.

An extension of Dart with untagged unions would be different in many ways. I just wanted to point out the fact that this distinction matters, so we shouldn't discuss union types (or sum types) without knowing which kind we're talking about. ;-)

eernstg avatar Nov 06 '18 13:11 eernstg

FutureOr<T> is a real use case that is hardcoded in Dart today

Right. We knew from day one that FutureOr could be used as a playground for getting to know exactly how that special case of union types would affect the whole language.

eernstg avatar Nov 06 '18 13:11 eernstg

@eernstg Thanks for the explanation! I misread your "wrapper object" as boxing. Yes, something like tagged unions does not require boxing. For example, Rust's and Swift's enum types are essentially tagged unions, and they do not box them.

yjbanov avatar Nov 06 '18 17:11 yjbanov

@yjbanov wrote:

For example, Rust's and Swift's enum types are essentially tagged unions, and they do not box them.

Right, but I think the situation in Rust and Swift is slightly different from the situation in Dart.

In general, a range of memory cells does not have to be a full-fledged heap entity in order to be able to contain a primitive value (like an integer or a pointer to an immutable string (Rust: &str)) as well as a reference to an associated value (Swift) or a tuple or struct with no subtype relations (Rust) such that the size of the whole thing is known statically as soon as we know which "case" of the enum type we are working on, and we can compute how much space is needed in order to be able to store any of the declared cases.

But at this point we do not have general support for non-heap entities in Dart of any other size than that of a primitive, so if we want to support that kind of composite entity in Dart I think we will have to keep them in the heap (that is, we need to box them). That could still be useful, however, because an enum with an associated value is similar to a tagged union where the enum value itself is the tag.

The ability to work on unboxed entities of arbitrary size would be an orthogonal feature in Dart, and we would be able to introduce support for this separately (and sort out how they would work in dynamic code), and then we could use it for entities of any kind, including enum-with-associated-value.

eernstg avatar Nov 12 '18 10:11 eernstg

If this is implemented + non-nullable types, then Dart will be the best language ever! Sum types (and records) are incredibly powerful constructs to model domains.

stt106 avatar Nov 15 '18 20:11 stt106

Let's say that Dart had a non-nullable type ancestor called Thing, Object might be Thing | Null. I'm still unaware of implementation implications for this features to work together but the message I'm trying to convey is that this might save a lot of refactoring if non-null is implemented.

GregorySech avatar Nov 15 '18 21:11 GregorySech

Hey Erik, since you asked for "software engineering arguments" further up, I will bite and state why algebraic data types and pattern matching is useful from a software engineering perspective.

The simple truth is: 1 - representing structured data so that it can be inspected from the outside isn't what objects were made for, and 2 - yet most data in real world applications has to deal with structured data (it started with tuples, records, relations and the "OO-SQL impedance mismatch").

Today, NULL/optional, enums etc can all provide only half-hearted solutions to case distinction. The paper "Matching Objects with Patterns" (Emir, Odersky, Williams) and also my thesis "Object-oriented pattern matching" discusses the benefits and drawbacks of the object-oriented and other encodings of algebraic data types. The "type-case" feature is also discussed, but not a notion of untagged sum type.

I'd be an advocate for the tried and tested algebraic data types which is a way to represent tuples, sum types, "value classes" and a whole bunch of other programming situations that all help with modeling and dealing with structured data. Algebraic data types would reflect the symmetry between product types and sum (=co-product) types, as known from category theory and its manifestations in functional programming in Haskell, Scala, ocaml aka Reason etc. What better software engineering argument than "it works" can there be? : ) Let me know if this is the appropriate issue or I should file a new one.

(updated, typo)

burakemir avatar Dec 16 '18 00:12 burakemir

(Hello Burak, long time no c, hope everything is well!)

data [..] inspected from the outside isn't what objects were made for

Agreed, I frequently mention that argument as well. And I'd prefer to widen the gap, in the sense that I would like to have one kind of entity which is optimized from access-from-outside ("data") and another one which is optimized for encapsulation and abstraction ("objects").

I'd be an advocate for the tried and tested algebraic data types

I think they would fit rather nicely into Dart if we build them on the ideas around enumerated type with "associated values", similarly to Swift enumerations (but we'd box them first, to get started, and then some unboxed versions could be supported later on where possible).

Considering the mechanisms out there in languages that are somewhat similar to Dart, the notion of an enum class with associated "values" comes to mind here. I mentioned enums and their relationship with inline allocation up here, but I did not mention how similar they are to algebraic data types.

// Emulate the SML type `datatype 'a List = Nil | Cons of ('a * 'a List)`
// using names that are Darty, based on an extension of the Dart `enum`
// mechanism with associated data.

enum EnumList<X> {
  Nil,
  Cons(X head, EnumList<X> tail),
}

int length<X>(EnumList<X> l) {
  switch (l) {
    case Nil: return 0;
    case Cons(var head, var tail): return 1 + length(tail);
  }
}

main() {
  MyList<int> l = MyList.Cons(42, MyList.Nil);
  print(length(l));
}

The enum declaration would then be syntactic sugar for a set of class declarations, one declaration similar to the one which is the specified meaning of the enum class today, and one subclass for each value declared in the enum, carrying the declared associated data in final fields.

We'd want generated code to do all the usual stuff for value-like entities (starting with operator == and hashCode, probably including support for some notion of copy-with-new-values, etc).

There will be lots of proposals for better syntax (e.g., for introduction of patterns introducing fresh names to bind as well as fixed values to check for), but the main point is that the notion of an enum mechanism with associated values would be a quite natural way in Dart to express a lot of the properties that algebraic data types are known to have in other languages.

eernstg avatar Dec 21 '18 12:12 eernstg

There are a lot of people in this thread with much more formal computer science education than I possess. My argument will be thus be quite simple. Union types in TypeScript make my life easy as a developer. Not having them in Dart makes my life considerably worse as a developer. That's all I have to share as a consumer of Dart, and quite frankly, that's all the experience I need to have an opinion on whether we should have them or not. I won't be able to provide anything of great intelligence to this conversation, just my opinion based on experience of building things with Dart. How that gets implemented is well outside of my wheelhouse, I'll defer to the software engineers to use (or not use) my feedback. 😄

lukepighetti avatar Dec 30 '18 20:12 lukepighetti

@lifranc Please don't come to language repository just to leave comments that do not add to the conversation. I'm watching the repository so I can keep track of all the conversations happening here... constructive conversations. Your comment is redundant, and unconstructive.

ds84182 avatar May 23 '19 01:05 ds84182

https://github.com/spebbe/dartz from @spebbe can be another use case that can make use of union types

dartz was originally written for Dart 1, whose weak type system could be abused to simulate higher-kinded types. Dart 2 has a much stricter type system, making it impossible to pull off the same trick without regressing to fully dynamic typing.

Also, https://developer.mozilla.org/en-US/docs/Web/API/WebSocket/send API, when implemented in Dart, is quite tricky, either we have to have send(dynamic data), which loses type-safety, or sendString(String data) + sendBytes(Uint8List data) + sendBlob(Blob data) etc

truongsinh avatar Aug 19 '19 15:08 truongsinh

I'd be really disappointed if Dart added ADTs / enum types instead of real union types. The latter are much more ergonomic, flexible and capture exactly the essence of what the developer wants to express: Different types are possible. This essence is captured with minimal syntax: A | B | C | D. No tags. It's also flexible in that you can pass a subset like A | C or B | D and it will still work correctly. This is what we really want: No limitations. Minimal syntax. Just the essence.

It's always annoying if you can't restrict to a subset and always have to deal with the whole enum and match against all of cases and raise an exception for the disallowed cases at runtime instead of having them disallowed at compile-time. It also makes working with types from different packages annoying because you can't easily mix them (esp. individual cases of the enum). This happens often enough to be an annoyance in Swift, Kotlin and other popular languages.

After all, we're doing this because we want compile-time checks. That's the whole point of union types. If you've ever worked with TypeScript's unions, as an end-developer (vs. the PL designer), you really don't want to go back to something as inflexible and verbose and unnecessarily complicated as enums. Yeah, enums might be easier to optimize, but I believe we can still find a "good enough" optimization for union types. The use-cases for unions in Dart would be different than those in e.g. Haskell because we mostly use classes and I believe the performance is less of an issue here. For example, we don't represent our lists as Cons | Nil. We already have a List class for that.

For more context, please take a look at my quick proposal from a year ago: https://github.com/dart-lang/sdk/issues/4938#issuecomment-396005754

wkornewald avatar Sep 13 '19 08:09 wkornewald

@wkornewald The untagged unions concept sounds nice, but it doesn't sound like a replacement to the algebraic data types. They can totally coexist and serve different needs.

werediver avatar Sep 13 '19 12:09 werediver

@werediver Sure they could coexist, but what is missing from union types that is absolutely necessary from a practical point of view? Do you have a code example?

wkornewald avatar Sep 13 '19 12:09 wkornewald

what is missing from union types that is absolutely necessary from a practical point of view?

That is quite easy to answer: extra semantics.

data Message
  = Info    String
  | Warning String
  | Error   String

Which would be possible with sealed classes and untagged unions together, but that would be noticeably less succinct.

werediver avatar Sep 13 '19 12:09 werediver

I definitely agree that union types expressed as A | B | C instead of something like sealed classes are a better IMO.

Technically speaking, sealed classes don't bring any new feature, they just reduce the boilerplate.

For example, sealed classes cannot be used to represent JSON since they are a new type. But unions can as they reuse existing types:

typedef Json = String | num | List<Json> | Map<String, Json>

Instead of such sealed class:

sealed class LoginResponse {
    data class Success(val authToken) : LoginResponse()
    object InvalidCredentials : LoginResponse()
    object NoNetwork : LoginResponse()
    data class UnexpectedException(val exception: Exception) : LoginResponse()
}

we could use a typedef & unions (+ potentially data classes to describe Token/...)

typedef LoginResponse =
  | Token
  | NoNetwork
  | InvalidCredentials
  | Exception

rrousselGit avatar Sep 13 '19 13:09 rrousselGit

You can do that with union types, too:

typedef Message
  = Info(String)
  | Warning(String)
  | Error(String)

We don't really have to use an explicit "data" keyword in typedefs to introduce new data classes. That's just a question of consistency vs succinctness. At the same time you can define a function like this which wouldn't be possible with algebraic data types:

void handleProblematicMessage(Warning | Error message) { ... }

With algebraic data types you have to write

void handleProblematicMessage(Message message) {
  switch(message) {
    Info -> throw Exception("Not a problematic message!");
    ...
  }
}

Now you've introduced a run-time error instead of a compile-time type error.

wkornewald avatar Sep 13 '19 13:09 wkornewald

We're clearly carelessly mixing different programming languages and terminology, so let's be more precise.

@wkornewald:

We don't really have to use an explicit "data" keyword ...

I used the data keyword, because I was "speaking" Haskell.

You can do that with union types, too: ...

These are tagged, which is the same as sum-types in the algebraic data types and not the same as the untagged unions that you were talking about initially.

That is exactly my point: tagged unions aka sum-types are useful, untagged unions are useful, they serve different needs, can coexist and be even more useful together :)

@rrousselGit:

For example, sealed classes cannot be used to represent JSON since they are a new type.

I don't see how this is true. Surely you can model JSON with sealed classes. Every concrete sealed class [in Kotlin] is a subclass of a particular base class. From the docs:

Sealed classes are used for representing restricted class hierarchies, when a value can have one of the types from a limited set, but cannot have any other type. ...

sealed class Expr
data class Const(val number: Double) : Expr()
data class Sum(val e1: Expr, val e2: Expr) : Expr()
object NotANumber : Expr()

werediver avatar Sep 13 '19 13:09 werediver

@werediver

Every concrete sealed class [in Kotlin] is a subclass of a particular base class. From the docs:

That's the problem. There's no common subclass with the different values of a JSON output. num is completely unrelated to String, which is completely unrelated to Map, ...

While it is possible to make a sealed class that represents JSON, it'd be a relatively big breaking change to update the serializer: The serialized object is of a different type.

For example, what used to be:

dynamic serialize(Foo foo) {
  return {
    'count': foo.count,
  };
}

would have to change to:

Json serialize(Foo foo) {
  return Json.map({
    'count': Json.number(foo.count),
  });
}

whereas with unions we'd have:

typedef Json = String | num | List<Json> | Map<String, Json>

Json serialize(Foo foo) {
  return {
    'count': foo.count,
  };
}

We can see that with unions, the implementation did not change. It's just suddenly type-safe.

rrousselGit avatar Sep 13 '19 13:09 rrousselGit

@wkornewald:

We don't really have to use an explicit "data" keyword ...

I used the data keyword, because I was "speaking" Haskell.

I was just referring to my original proposal where typedef was mixed with data Info(String) instead of just Info(String) to be more explicit, but we could make that optional.

You can do that with union types, too: ...

These are tagged, which is the same as sum-types in the algebraic data types and not the same as the untagged unions that you were talking about initially.

That is exactly my point: tagged unions aka sum-types are useful, untagged unions are useful, they serve different needs, can coexist and be even more useful together :)

Ah I see where the difference in our wording is then. :) When takling about "tagged unions" or "algebraic data types" the "tag" is usually not an independent type on its own, but only usable via the whole union like I showed in my code example (which is the usual limitation of tagged unions). That's why I'm talking about union types + data classes as separate concepts that are usable independently of each other, but can be combined to get something more flexible than algebraic data types.

I've also updated my comment to use "union types" instead of "untagged unions" to make this clearer.

wkornewald avatar Sep 13 '19 13:09 wkornewald

@wkornewald I had to read the proposal to make a good sense out of your arguments, but in the end I like how you combine features there. Indeed, the union types in combination with data classes make a good mix. With a deconstructing pattern-matching it can totally replace the "classical" (Haskell, Swift) sum-types, looks nice 👍

werediver avatar Sep 13 '19 14:09 werediver

@werediver Great, then let's convince the majority of the Dart team that this is where we'd like to go. ;)

wkornewald avatar Sep 13 '19 16:09 wkornewald

@wkornewald But did you pay attention to the reply of @eernstg to you original proposal?

He is basically saying that the union types don't help a compiler [that much] to reason about types in presence of subtyping, whereas the sealed class hierarchies do. And, well, my answer to that would be "who needs subtyping anyway, it screws everything!", but it's already in Dart.

We do want a compiler that can reason about the code well, make deep conclusions and good optimizations, right? And having limited resources for development it makes quite a bit of sense to adopt a feature that enables as many good things in the compiler as possible.

In other words, an opinion of the masses is good to take into account, but the through technical analysis by the core team is essential. Let's hope they are going to pull it off in a way that satisfies us 🤞

werediver avatar Sep 13 '19 16:09 werediver

@werediver Oh yeah, I didn't have the time to reply back then, but basically, the compiler can't easily optimize arbitrary use of unions, but I believe we can find an acceptable trade-off that works in 80-90% of the cases. For example, for the in-memory representation we could fall back to the smallest & most frequently used typedef that fully contains a given union. Also, monomophization could improve performance in critical code paths with small union types (maybe only for pairs). Finally, we rarely need union types to be as performant as e.g. in Haskell because their use-case is different. We don't have Cons | Nil, but a simple List class.

So union types will only be used in few select cases and in non-trivial cases we'll have a single typedef for the whole union and the compiler can internally treat that as a single unit/ADT. And if that's not enough I'm sure we can come up with even more optimizations. I'd rather deal with the optimization problem than the prevalent inflexibility of verbose ADT-based code.

wkornewald avatar Sep 13 '19 17:09 wkornewald

@werediver makes the same distinction that I did:

These are tagged, which is the same as sum-types in the algebraic data types and not the same as the untagged unions that you were talking about initially.

Algebraic data types are useful for programming in a somewhat functional style (it's still Dart, of course) where the set of variants is known precisely when the algebraic data type itself is known (because we wouldn't go into anything like extensible variants, when we already have classes): This allows a pattern matching construct to be checked for exhaustiveness, and there is no subtyping so we know exactly how each variant is structured.

The untagged unions are basically a safer version of the common supertype: If a method accepts int | String | Map<int, int> then we can't do anything meaningful on an expression of that type, but we can discriminate in the body of the method and do something for the int and something else for the String, etc. If we had used the common supertype Object then it would have allowed a lot of other types than the union type, in which case we'd need error handling and we'd get much less static checking at call sites. But there's nothing stopping us from using num | int or other unions that don't amount to a set of disjoint cases, and we might still check for is int first and then is num, and get some meaningful case based behavior anyway. So untagged unions basically provide a lot fewer constraints and guarantees, but since we're talking about object-oriented programs we will have type tags on every entity in the heap anyway, and this means that we can discriminate between different types of objects even with an untagged union.

I just keep coming back to the conclusion that they are different things, and may be useful for different purposes, so I guess I should stop ranting. ;-)

eernstg avatar Sep 20 '19 00:09 eernstg