fslang-suggestions icon indicating copy to clipboard operation
fslang-suggestions copied to clipboard

Erased type-tagged anonymous union types

Open alfonsogarciacaro opened this issue 7 years ago • 86 comments

Modified Proposal (by @dsyme)

This is a suggestion to add adhoc structural type-tagged union types where

  • type syntax (A|B) or Typed(A|B) or Typed<A,B>
  • each type of (A|B|C|...) is distinct and non-overlapping w.r.t. runtime type tests
  • such types are erased to object
  • introducing such a union value would need to be either explicit (e.g. using some new operator like Typed) or type-directed or both

See https://github.com/fsharp/fslang-suggestions/issues/538 for original proposal. e.g. this would allow some or all of these:

let generateValue1 () : Typed(int | string)) = if Monday then 2 else "4"

let generateValue2 () = if Monday then Typed 2 else Typed "4"

let eliminateValue (x : Typed(int | string)) = ...
    match x with 
    | :? int as i ->  ...
    | :? string as s -> ...

let eliminateValue2 x = ...
    match x with 
    | Typed(i : int) ->  ...
    | Typed(s; string) -> ...

type Allowed =  Typed(int | string)

There are plenty of questions about such a design (e.g. can you eliminate "some but not all" of the cases in a match? Is column-polymorphism supported?). However putting those aside, such a construct already has utility in the context of Fable, since it corresponds pretty closely to Typescript unions and how JS treats values. It also has lots of applications in F# programming, especially if the use of Typed can be inferred in many cases.

Now, this construct automatically gives a structural union message type , e.g.

type MsgA = MsgA of int * int
let update1 () = Typed (MsgA (3,4))

type MsgB = MsgB of int * int
let update2 () = Typed (MsgB (3,4))

val update1 : unit -> Typed MsgA  // actually : unit -> Typed (MsgA | ..), i.e. column-generic on use
val update2 : unit -> Typed MsgB // actually : unit -> Typed (MsgB | ..), i.e. column-generic on use

and a combination of update1 and update2 would give

let update = combine update1 update2 

val update : unit -> Typed (MsgA | MsgB)

As noted in the comments, some notion of column-generics would likely be needed, at least introduced implicitly at use-sites.

Original Proposal (@alfonsogarciacaro)

I propose we add erased union types as an F# first citizen. The erased union types already exist in Fable to emulate Typescript (non-labeled) union types:

http://fable.io/docs/interacting.html#Erase-attribute

Note that Fable allows you to define your custom erased union types, but this is because it's painful to type a generic one like U2.Case1. If the compiler omits the need to prefix the argument, this wouldn't be necessary and using a generic type can be the easiest solution.

The F# compiler could convert the following code:

// The name ErasedUnion is tentative
// The compiler should check the generic args are different
let foo(arg: ErasedUnion<string, int>) =
    match arg with
    | ErasedUnion.Case1 s -> s.Length
    | ErasedUnion.Case2 i -> i

// No need to instantiate ErasedUnion, but the compiler checks the type
foo "hola"
foo 5
// This doesn't compile
foo 5.

Into something like:

let foo(arg: obj) =
   match arg with
   | :? string as s -> s.Length
   | :? int as i -> i
   | _ -> invalidArg "arg" "Unexpected type"
  • Pros: It will make the Fable bindings generated from Typescript declaration files much more pleasant to work with.

  • Cons: It's a feature that seems to be exclusively dedicated to interact with a dynamic language like JS.

  • Estimated cost (XS, S, M, L, XL, XXL): S

Alternatives

For Fable it's been suggested to generate overloads in the type bindings instead of using erased union types:

interface IFoo {
    foo(arg: string | number): void;
}
type IFoo =
    abstract foo: arg: string -> unit
    abstract foo: arg: number -> unit

However these has some problems:

  • It can quickly explode when you have several erased union arguments
  • Due to type inference the F# compiler many times doesn't know which overload to use
  • Cannot be used in properties
  • Doesn't let you use erased unions yourself.

Affadavit

Please tick this by placing a cross in the box:

  • [x] This is not a question (e.g. like one you might ask on stackoverflow) and I have searched stackoverflow for discussions of this issue
  • [x] I have searched both open and closed suggestions on this site and believe this is not a duplicate
  • [x] This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it.

Please tick all that apply:

  • [x] This is not a breaking change to the F# language design
  • [x] I would be willing to help implement and/or test this
  • [ ] I or my company would be willing to help crowdfund F# Software Foundation members to work on this

alfonsogarciacaro avatar Feb 14 '17 17:02 alfonsogarciacaro

This is sound like it need to be implemented with CompilationRepresentationAttribute Like the way the option.None represents as null so you can defined your erased union

AviAvni avatar Feb 14 '17 17:02 AviAvni

@alfonsogarciacaro erased unions are not only a thing for dynamic lang transpilation. They are also useful in message-based systems i.e. when you want to describe protocols in as a closed set of messages (in case of F# those could be discriminated unions). In that case a behavior that wants to satisfy more than one protocol, must have some way to define union of those, which so far is possible only as a lowest common denominator (usually an obj type).

Horusiath avatar Feb 15 '17 10:02 Horusiath

There are some other interesting reasons for this compiler feature. One is that we frequently hit situations in the F# compiler where a union type incurs an entire extra level in allocations, e.g.

type NameResolutionItem = 
    | Value of ValRef
    | UnionCase of UnionCaseRef
    | Entity of EntityRef
    | ...

The needs for this type are relatively "low perf" (cost of discrimination doesn't really matter - multiple type switches are ok) but the type gets many, many long-lived allocations when the F# compiler is hosted in the IDE. One could make the type a struct wrapping an obj reference manually, but simply adding an annotation to represent this as an erased union type and discriminate by type switching would be a much less intrusive code change. (Note using a struct union would not work well as the struct would still have a dsicrimination tag integer, and would have one field for each union case - struct unions are by no means perfect representations for multi-case types as things stand at the moment)

Estimated cost (XS, S, M, L, XL, XXL): S

:) There's not really any such thing as "S" for language features :) I'd say "M" or "L".

... CompilationRepresentationAttribute ...

yes that would seem natural

dsyme avatar Feb 15 '17 11:02 dsyme

Will there be multiple ErasedUnion s under this proposal? Like ErasedUnion3, ErasedUnion4 etc.?

robkuz avatar Feb 15 '17 13:02 robkuz

@robkuz if the implementation will be with CompilationRepresentationAttribute then you can create your own erased union

[<CompilationRepresentationAttribute(CompilationRepresentationFlags.ErasedUnion)>]
type DU<'a, 'b> = A of 'a | B of 'b

AviAvni avatar Feb 15 '17 14:02 AviAvni

@AviAvni @dsyme Please note that if this just enables a CompilationRepresentationFlags.ErasedUnion on customly defined unions and doesn't allow implicit conversions when passing arguments (in the example above writing foo "hola" instead of foo (ErasedUnion.Case1 "hola")), there won't be much benefit for Fable, as this is basically the same situation as we have now.

alfonsogarciacaro avatar Feb 15 '17 15:02 alfonsogarciacaro

This is almost same as or on type operator.T1 or T2. In perspective can be realized by special attribute on function. Full erased from compiled code. But how to save metadata?

ijsgaus avatar Feb 16 '17 17:02 ijsgaus

But how to save metadata?

I think the intent is that the types would be erased (like other F# information). The metadata would only available at compile-time through the extra blob of F#-specific metadata that F# uses

dsyme avatar Feb 17 '17 13:02 dsyme

I think that given the reasoning above (both for FABLE and the use case @Horusiath mentioned), this would be a good addition. 👍

cartermp avatar Feb 20 '17 03:02 cartermp

Is it very important that the type is erased?

Perhaps it's a slightly separate proposal, but I would love to have ad-hoc type unions in the form:

let print (item : string | int) = 
    match item with
    |  s : string -> printfn "We have a string: %s" s
    |  i : int ->   printfn "We have an int: %i" i

Which would essentially compile down to the same IL as:

let print (item : Choice<string, int>) = 
    match item with
    | Choice1Of2 s -> printfn "We have a string: %s" s
    | Choice2Of2 i -> printfn "We have an int: %i" i

and, more importantly, at the callsite:

print "Hello world"

instead of:

print (Choice1Of2 "Hello world")

Richiban avatar Mar 14 '17 18:03 Richiban

In Fable we've finally managed to remove the erased union case name by using the so-called erased/implicit cast operator !^. Check this and this. So now it's possible to do:

let foo(arg: U2<string, int>) =
    match arg with
    | U2.Case1 s -> s.Length
    | U2.Case2 i -> i

// No need to write foo(U2.Case1 "hola")
foo !^"hola"
foo !^5
// The argument is still type checked. This doesn't compile
foo !^5.

alfonsogarciacaro avatar Jul 23 '17 22:07 alfonsogarciacaro

TypeScript also supports string literals in these union types, i.e, in addition to type T1 = number | string, it also supports type T1 = number | "string1" | "string2". Would be nice to also support that.

Or alternatively, if string enums were supported like in TypeScript, we could acheive the same effect that way:

    enum Colors { Red = "RED", Green = "GREEN", Blue = "BLUE" }
    type T = number | Colors

ovatsus avatar Dec 03 '17 21:12 ovatsus

As a reference, Fable already supports string enums :smile:

alfonsogarciacaro avatar Dec 04 '17 08:12 alfonsogarciacaro

@Richiban is this what you're looking for? - Polymorphic Variants

cloudRoutine avatar Dec 20 '17 00:12 cloudRoutine

@Richiban @alfonsogarciacaro I hijacked this suggestion to convert this to a suggestion for erased ad-hoc type unions of the kind suggested by @Richiban

(Note sure what the callsite would be though @Richiban - perhaps what you say)./

dsyme avatar Mar 02 '18 21:03 dsyme

Can we make this types not erased? Why not introduce base implementation on Typed<'t1, 't2, ...> and make this as member of FSharp.Core

ijsgaus avatar Mar 03 '18 18:03 ijsgaus

@ijsgaus But if it's not erased then it's no difference from Choice<'a,' b>

Richiban avatar Mar 03 '18 19:03 Richiban

This seems like a really sweet suggestion! I imagine it could help the performance of a lot of library code.

wallymathieu avatar Aug 30 '18 15:08 wallymathieu

Would this help this problem?

type Goose = Goose of int
type Cardinal = Cardinal of int
type Mallard = Mallard of int
type Bird = Goose | Cardinal | Mallard
let x  = Goose 7

This code fails. Goose in the Bird DU shadows Goose as a type and turns it into an Atom. This shadowing happens silently and at least to me is surprising.

type Goose = Goose of int
type Cardinal = Cardinal of int
type Mallard = Mallard of int
type Bird = Goose of Goose | Cardinal of Cardinal | Mallard of Mallard
let x  = Goose 7

The type shadowing here still means I can't move forward, because there's no way to make a Goose....

type Goose = Goose of int
type Cardinal = Cardinal of int
type Mallard = Mallard of int
type Bird = Goose' of Goose | Cardinal' of Cardinal | Mallard' of Mallard
let x  = Goose' (Goose 7)

This works. This kind of situation happens where someone created a single case DU, and it gets consumed by someone who can't muck with the original DU for fear of breaking existing code.

voronoipotato avatar Apr 04 '19 14:04 voronoipotato

type Goose = Goose of int
type Cardinal = Cardinal of int
type Mallard = Mallard of int
type Bird = Goose of Goose | Cardinal of Cardinal | Mallard of Mallard
let x  = Goose 7

The type shadowing here still means I can't move forward, because there's no way to make a Goose....

You can make an instance of the Goose type by specifying the type as well as the case:

let x = Goose.Goose 7 // This works

When wrapped in a module, you can still access everything, but there is weirdness:

module Birds =
    type Goose = Goose of int
    type Cardinal = Cardinal of int
    type Mallard = Mallard of int
    type Bird = Goose of Goose | Cardinal of Cardinal | Mallard of Mallard

open Birds

If you specify Birds.Goose this gives you the Goose case of the type Goose, and you can't specify it as Birds.Goose.Goose i.e. using the format Module.Type.Case:

let gooseA = Goose.Goose 7 // Type.Case
let gooseB = Birds.Goose 7 // Module.Case
//let gooseC = Birds.Goose.Goose 7 // Module.Type.Case // <-- Doesn't work

Conversely, you must use that form if you wish to fully specify the Goose case of the Bird type i.e. you must specify it as Birds.Bird.Goose:

let gooseBird1 = Goose gooseA // Case
//let gooseBird2 = Birds.Goose gooseA // Module.Case // <-- Doesn't work
let gooseBird2 = Birds.Bird.Goose gooseA // Module.Type.Case

BillHally avatar Apr 06 '19 14:04 BillHally

Would this help this problem?

type Goose = Goose of int
type Cardinal = Cardinal of int
type Mallard = Mallard of int
type Bird = Goose | Cardinal | Mallard
let x  = Goose 7

This code fails. Goose in the Bird DU shadows Goose as a type and turns it into an Atom. This shadowing happens silently and at least to me is surprising.

type Goose = Goose of int
type Cardinal = Cardinal of int
type Mallard = Mallard of int
type Bird = Goose of Goose | Cardinal of Cardinal | Mallard of Mallard
let x  = Goose 7

The type shadowing here still means I can't move forward, because there's no way to make a Goose....

type Goose = Goose of int
type Cardinal = Cardinal of int
type Mallard = Mallard of int
type Bird = Goose' of Goose | Cardinal' of Cardinal | Mallard' of Mallard
let x  = Goose' (Goose 7)

This works. This kind of situation happens where someone created a single case DU, and it gets consumed by someone who can't muck with the original DU for fear of breaking existing code.

This code would be particulary useful for reusing existing cases and avoiding to nest it.

realvictorprm avatar Oct 02 '19 14:10 realvictorprm

@dsyme,although the discussion has been dormant a bit in this topic, I think this is still a very valuable addition to the language.

Perhaps we could move this forward and mark it approved in principle, so that we can start working it out into an RFC? I'd be willing to put in the effort for that.

abelbraaksma avatar Oct 02 '19 16:10 abelbraaksma

Some additional motivation: https://www.reddit.com/r/programming/comments/e8wfsr/discriminated_unions_in_c_an_unexceptional_love/fagb0hv/

cartermp avatar Dec 11 '19 15:12 cartermp

such types are erased to object

This makes sense if there are value types in the set of types, but if there are reference types only, it makes more sense to me to erase to the most-derived common base class. For instance, given the following:

type A() = class end
type B() = inherit A()
type C() = inherit B()
type D() = inherit A()

I'd expect (B | C) to be exactly equivalent to a declaration of type B, and (B | C | D) to erase to A (but not include A). However, in this case, I'd also expect all the members of A to be available on this value without needing to cast or pattern match.


introducing such a union value would need to be either explicit (e.g. using some new operator like Typed) or type-directed or both

This Typed operator feels a little cumbersome to me. Out of curiosity, why not allow these new anonymous union types to be inferred directly where possible? To adapt the original example:

let generateValue0 () = if Monday then 2 else "4"

This function would have the inferred type unit -> (int | string)

This code would have been illegal before, so I don't think this would be a breaking change to the language. I could see the argument being made that this would make debugging harder if you had actually intended both branches of the if to return the same type, but you can always add a type annotation to enforce that

Going back to @voronoipotato's example, this would be amazing to be able to do:

type Goose = Goose of int
type Cardinal = Cardinal of int
type Mallard = Mallard of int

// a type abbreviation for an erased anonymous union
// (note the parenthesis to disambiguate from creating a new nominal union type)
type Bird = (Goose | Cardinal | Mallard) 

// inferred to have type (Goose | Cardinal | Mallard) -> string
let talk = function
| Goose _ -> "honk"
| Cardinal _ -> "tweet"
| Mallard _ -> "quack"

Another note: in the CIL metadata, we might be able to enforce type safety and still expose something reasonable to other languages by emitting something more or less equivalent to this pseudo C#:

// Overloads for other .NET languages, ignored by F#
//. We'd only need to emit these for public symbols
public static string talk (Goose arg) => talk ((object)arg);
public static string talk (Cardinal arg) => talk ((object)arg);
public static string talk (Mallard arg) => talk ((object)arg);

// This is the one F# would call
// I believe adding the `modreq` to the argument type would prevent this overload
//  from being used by other .NET languages
public static string talk (
    [ErasedUnion (new[] { typeof (Goose), typeof (Cardinal), typeof (Mallard) })]
    object modreq([FSharp.Core]ErasedUnion) arg)
{
    // actual implementation here
}

chkn avatar Feb 06 '20 07:02 chkn

Was going to add a suggestion but found this thread.

Currently working with Akka and Akkling. It would be great if we are able to restrict the type of Object we consume using purely types.

Another use case is when working with actor frameworks, you would want 2 parents actors in a hierarchy be able to handle messages of the same type, but still be able to handle other messages which are of non-intersecting types, when the child actor sends them.

module Protocol = 
  type Person = ...
  type Animal = ...
  type AllMessages = (Person | Animal | string | int)

module Parent1 = 
  type Msg = (Person | string) 

module Parent2 = 
  type Msg = (Person | Animal)

module Child = 
  open Protocol
  type Msg = (string | int) 
  type ParentMsg = Person

  let child (mailbox: Actor<Msg, ParentMsg>) (msg: ChildMsg) = 
    // can send a message of `Person` without being aware of the sender/parent possible types 
    // parent IActorRef could either be Parent1 or Parent2
    mailbox.Sender().Tell(new Person ())

As F#ers, we prefer strong typing and encoding domain information as much as possible into the type system. However there is no escape if we need to use the massive amount of ecosystem, out there written in C#, as such we could have a type level _ /ignore pattern. We could define a union such as

type Msg = (Person | string | _ ) 

Here, when we consume this type, we expect Person or string but could also be anything we don't expect or want to handle.

This is erased, and also enforces pattern matching, this would give first class support for F# working with most C# libraries using such pattern with minimal changes.

type Msg = (Person | string | _ ) 

let handleMsg: Msg -> unit = function
  (*  Compiler forces you to match on Person and string
  and will warn if _ isn't handled explicitly  *)
  | Person { name = name } -> printfn "Name: %s" name 
  | str -> printfn "Received string: %s" str 
  | _ -> unhandled ()

let object: System.Object = somecsharpFuncReturnObject ()
object |> handleMsg // implicitly converted to Msg and all Msg types are subset of System.Object

Handling special Akka.net messages would be simple as just defining message for the handling actor without resorting to System.Object

type RootActorMsg = (Akka.Actor.ReceiveTimeout | string | _ )

Swoorup avatar May 29 '20 10:05 Swoorup

I think this draft PR is an attempt at addressing this: https://github.com/dotnet/fsharp/pull/8927

abelbraaksma avatar May 29 '20 10:05 abelbraaksma

@abelbraaksma I don't think this is really a feature covered here. Type unions are powerful tool in the hands of a programmer thanks to the laws they offer (at least in languages implementing them like Scala 3, Ceylon or TypeScript):

  • They are associative → (A | B) | C is the same as A | (B | C)
  • They are commutative → A | B is the same as B | A
  • They are idempotent → A | A and A are the same.

If we extend that over the concept of subtyping:

  • If we have classes Animal and Cat :> AnimalCat | Animal is the same as Animal.
  • Given that F# has bottom type (which is not quite true, but let's say that obj could work in the actual implementation → A | obj and obj are the same.
  • Given that F# would have subtype of every other type (which is not true) → A | Nothing would always be Nothing.

The PR which you linked doesn't quite fit into any of these laws, it doesn't even fit into the title of this issue (as it's a syntax suggar over choice, which holds type tag and is not erased).

Horusiath avatar Jun 06 '20 05:06 Horusiath

@Horusiath, thanks, that was very insightful. I wasn't aware of these approaches not overlapping, but now I know I should probably experiment a bit more with languages that do have this feature to understand it better.

abelbraaksma avatar Jun 06 '20 14:06 abelbraaksma

```fsharp
let generateValue0 () = if Monday then 2 else "4"

This function would have the inferred type unit -> (int | string) This code would have been illegal before, so I don't think this would be a breaking change to the language.

For me, the above inference suggestion is a deal-breaker. It turns a common error with 'if' into a nest of knock-on issues as you try and work out why you're suddenly passing an "(A | B)" where a "B" is required.

I like to think of language feature design from two perspectives: The initial coding, and then the re-visitation under the stress of refactoring. For the latter, F# has excellent provision through the interplay of all sorts of little features, and is a large part of the reason why I'm here.

Typescript shares much with C++, namely, incrementally covering over an existing language that itself hadn't started in a great place. I'd be hesitant to import anything compromising from the dynamic/imperative space.

2c/ I could support adding a shorthand type-declaration syntax that de-sugars to a DU (like the example @dsyme posted earlier). Issues would be: To support generating the case constructor names, could such a DU only ever comprise of named types? And what would the prefix for the case-constructors be?

jonathan-markland avatar Sep 28 '20 22:09 jonathan-markland

let generateValue0 () = if Monday then 2 else "4"

This function would have the inferred type unit -> (int | string) This code would have been illegal before, so I don't think this would be a breaking change to the language.

For me, the above inference suggestion is a deal-breaker. It turns a common error with 'if' into a nest of knock-on issues as you try and work out why you're suddenly passing an "(A | B)" where a "B" is required.

You could potentially enforce specifying type signatures for union types. However I feel, the same could be said for simple types that currently exists. You accidentally return something string instead of int and you have issues in other places. You would ideally restrict the types you consume at a boundary point (perhaps, at the module level or project level)

Swoorup avatar Sep 29 '20 03:09 Swoorup