fslang-suggestions
fslang-suggestions copied to clipboard
Erased type-tagged anonymous union types
Modified Proposal (by @dsyme)
This is a suggestion to add adhoc structural type-tagged union types where
- type syntax
(A|B)
orTyped(A|B)
orTyped<A,B>
- each type of (A|B|C|...) is distinct and non-overlapping w.r.t. runtime type tests
- such types are erased to
object
- introducing such a union value would need to be either explicit (e.g. using some new operator like
Typed
) or type-directed or both
See https://github.com/fsharp/fslang-suggestions/issues/538 for original proposal. e.g. this would allow some or all of these:
let generateValue1 () : Typed(int | string)) = if Monday then 2 else "4"
let generateValue2 () = if Monday then Typed 2 else Typed "4"
let eliminateValue (x : Typed(int | string)) = ...
match x with
| :? int as i -> ...
| :? string as s -> ...
let eliminateValue2 x = ...
match x with
| Typed(i : int) -> ...
| Typed(s; string) -> ...
type Allowed = Typed(int | string)
There are plenty of questions about such a design (e.g. can you eliminate "some but not all" of the cases in a match? Is column-polymorphism supported?). However putting those aside, such a construct already has utility in the context of Fable, since it corresponds pretty closely to Typescript unions and how JS treats values. It also has lots of applications in F# programming, especially if the use of Typed
can be inferred in many cases.
Now, this construct automatically gives a structural union message type , e.g.
type MsgA = MsgA of int * int
let update1 () = Typed (MsgA (3,4))
type MsgB = MsgB of int * int
let update2 () = Typed (MsgB (3,4))
val update1 : unit -> Typed MsgA // actually : unit -> Typed (MsgA | ..), i.e. column-generic on use
val update2 : unit -> Typed MsgB // actually : unit -> Typed (MsgB | ..), i.e. column-generic on use
and a combination of update1 and update2 would give
let update = combine update1 update2
val update : unit -> Typed (MsgA | MsgB)
As noted in the comments, some notion of column-generics would likely be needed, at least introduced implicitly at use-sites.
Original Proposal (@alfonsogarciacaro)
I propose we add erased union types as an F# first citizen. The erased union types already exist in Fable to emulate Typescript (non-labeled) union types:
http://fable.io/docs/interacting.html#Erase-attribute
Note that Fable allows you to define your custom erased union types, but this is because it's painful to type a generic one like U2.Case1
. If the compiler omits the need to prefix the argument, this wouldn't be necessary and using a generic type can be the easiest solution.
The F# compiler could convert the following code:
// The name ErasedUnion is tentative
// The compiler should check the generic args are different
let foo(arg: ErasedUnion<string, int>) =
match arg with
| ErasedUnion.Case1 s -> s.Length
| ErasedUnion.Case2 i -> i
// No need to instantiate ErasedUnion, but the compiler checks the type
foo "hola"
foo 5
// This doesn't compile
foo 5.
Into something like:
let foo(arg: obj) =
match arg with
| :? string as s -> s.Length
| :? int as i -> i
| _ -> invalidArg "arg" "Unexpected type"
-
Pros: It will make the Fable bindings generated from Typescript declaration files much more pleasant to work with.
-
Cons: It's a feature that seems to be exclusively dedicated to interact with a dynamic language like JS.
-
Estimated cost (XS, S, M, L, XL, XXL): S
Alternatives
For Fable it's been suggested to generate overloads in the type bindings instead of using erased union types:
interface IFoo {
foo(arg: string | number): void;
}
type IFoo =
abstract foo: arg: string -> unit
abstract foo: arg: number -> unit
However these has some problems:
- It can quickly explode when you have several erased union arguments
- Due to type inference the F# compiler many times doesn't know which overload to use
- Cannot be used in properties
- Doesn't let you use erased unions yourself.
Affadavit
Please tick this by placing a cross in the box:
- [x] This is not a question (e.g. like one you might ask on stackoverflow) and I have searched stackoverflow for discussions of this issue
- [x] I have searched both open and closed suggestions on this site and believe this is not a duplicate
- [x] This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it.
Please tick all that apply:
- [x] This is not a breaking change to the F# language design
- [x] I would be willing to help implement and/or test this
- [ ] I or my company would be willing to help crowdfund F# Software Foundation members to work on this
This is sound like it need to be implemented with CompilationRepresentationAttribute Like the way the option.None represents as null so you can defined your erased union
@alfonsogarciacaro erased unions are not only a thing for dynamic lang transpilation. They are also useful in message-based systems i.e. when you want to describe protocols in as a closed set of messages (in case of F# those could be discriminated unions). In that case a behavior that wants to satisfy more than one protocol, must have some way to define union of those, which so far is possible only as a lowest common denominator (usually an obj
type).
There are some other interesting reasons for this compiler feature. One is that we frequently hit situations in the F# compiler where a union type incurs an entire extra level in allocations, e.g.
type NameResolutionItem =
| Value of ValRef
| UnionCase of UnionCaseRef
| Entity of EntityRef
| ...
The needs for this type are relatively "low perf" (cost of discrimination doesn't really matter - multiple type switches are ok) but the type gets many, many long-lived allocations when the F# compiler is hosted in the IDE. One could make the type a struct wrapping an obj reference manually, but simply adding an annotation to represent this as an erased union type and discriminate by type switching would be a much less intrusive code change. (Note using a struct union would not work well as the struct would still have a dsicrimination tag integer, and would have one field for each union case - struct unions are by no means perfect representations for multi-case types as things stand at the moment)
Estimated cost (XS, S, M, L, XL, XXL): S
:) There's not really any such thing as "S" for language features :) I'd say "M" or "L".
... CompilationRepresentationAttribute ...
yes that would seem natural
Will there be multiple ErasedUnion
s under this proposal?
Like ErasedUnion3
, ErasedUnion4
etc.?
@robkuz if the implementation will be with CompilationRepresentationAttribute then you can create your own erased union
[<CompilationRepresentationAttribute(CompilationRepresentationFlags.ErasedUnion)>]
type DU<'a, 'b> = A of 'a | B of 'b
@AviAvni @dsyme Please note that if this just enables a CompilationRepresentationFlags.ErasedUnion
on customly defined unions and doesn't allow implicit conversions when passing arguments (in the example above writing foo "hola"
instead of foo (ErasedUnion.Case1 "hola")
), there won't be much benefit for Fable, as this is basically the same situation as we have now.
This is almost same as or on type operator.T1 or T2. In perspective can be realized by special attribute on function. Full erased from compiled code. But how to save metadata?
But how to save metadata?
I think the intent is that the types would be erased (like other F# information). The metadata would only available at compile-time through the extra blob of F#-specific metadata that F# uses
I think that given the reasoning above (both for FABLE and the use case @Horusiath mentioned), this would be a good addition. 👍
Is it very important that the type is erased?
Perhaps it's a slightly separate proposal, but I would love to have ad-hoc type unions in the form:
let print (item : string | int) =
match item with
| s : string -> printfn "We have a string: %s" s
| i : int -> printfn "We have an int: %i" i
Which would essentially compile down to the same IL as:
let print (item : Choice<string, int>) =
match item with
| Choice1Of2 s -> printfn "We have a string: %s" s
| Choice2Of2 i -> printfn "We have an int: %i" i
and, more importantly, at the callsite:
print "Hello world"
instead of:
print (Choice1Of2 "Hello world")
In Fable we've finally managed to remove the erased union case name by using the so-called erased/implicit cast operator !^
. Check this and this. So now it's possible to do:
let foo(arg: U2<string, int>) =
match arg with
| U2.Case1 s -> s.Length
| U2.Case2 i -> i
// No need to write foo(U2.Case1 "hola")
foo !^"hola"
foo !^5
// The argument is still type checked. This doesn't compile
foo !^5.
TypeScript also supports string literals in these union types, i.e, in addition to type T1 = number | string
, it also supports type T1 = number | "string1" | "string2"
. Would be nice to also support that.
Or alternatively, if string enums were supported like in TypeScript, we could acheive the same effect that way:
enum Colors { Red = "RED", Green = "GREEN", Blue = "BLUE" }
type T = number | Colors
As a reference, Fable already supports string enums :smile:
@Richiban is this what you're looking for? - Polymorphic Variants
@Richiban @alfonsogarciacaro I hijacked this suggestion to convert this to a suggestion for erased ad-hoc type unions of the kind suggested by @Richiban
(Note sure what the callsite would be though @Richiban - perhaps what you say)./
Can we make this types not erased? Why not introduce base implementation on Typed<'t1, 't2, ...> and make this as member of FSharp.Core
@ijsgaus But if it's not erased then it's no difference from Choice<'a,' b>
This seems like a really sweet suggestion! I imagine it could help the performance of a lot of library code.
Would this help this problem?
type Goose = Goose of int
type Cardinal = Cardinal of int
type Mallard = Mallard of int
type Bird = Goose | Cardinal | Mallard
let x = Goose 7
This code fails. Goose in the Bird DU shadows Goose as a type and turns it into an Atom. This shadowing happens silently and at least to me is surprising.
type Goose = Goose of int
type Cardinal = Cardinal of int
type Mallard = Mallard of int
type Bird = Goose of Goose | Cardinal of Cardinal | Mallard of Mallard
let x = Goose 7
The type shadowing here still means I can't move forward, because there's no way to make a Goose....
type Goose = Goose of int
type Cardinal = Cardinal of int
type Mallard = Mallard of int
type Bird = Goose' of Goose | Cardinal' of Cardinal | Mallard' of Mallard
let x = Goose' (Goose 7)
This works. This kind of situation happens where someone created a single case DU, and it gets consumed by someone who can't muck with the original DU for fear of breaking existing code.
type Goose = Goose of int
type Cardinal = Cardinal of int
type Mallard = Mallard of int
type Bird = Goose of Goose | Cardinal of Cardinal | Mallard of Mallard
let x = Goose 7
The type shadowing here still means I can't move forward, because there's no way to make a Goose....
You can make an instance of the Goose
type by specifying the type as well as the case:
let x = Goose.Goose 7 // This works
When wrapped in a module, you can still access everything, but there is weirdness:
module Birds =
type Goose = Goose of int
type Cardinal = Cardinal of int
type Mallard = Mallard of int
type Bird = Goose of Goose | Cardinal of Cardinal | Mallard of Mallard
open Birds
If you specify Birds.Goose
this gives you the Goose
case of the type Goose
, and
you can't specify it as Birds.Goose.Goose
i.e. using the format Module.Type.Case
:
let gooseA = Goose.Goose 7 // Type.Case
let gooseB = Birds.Goose 7 // Module.Case
//let gooseC = Birds.Goose.Goose 7 // Module.Type.Case // <-- Doesn't work
Conversely, you must use that form if you wish to fully specify the Goose
case of the Bird
type i.e.
you must specify it as Birds.Bird.Goose
:
let gooseBird1 = Goose gooseA // Case
//let gooseBird2 = Birds.Goose gooseA // Module.Case // <-- Doesn't work
let gooseBird2 = Birds.Bird.Goose gooseA // Module.Type.Case
Would this help this problem?
type Goose = Goose of int type Cardinal = Cardinal of int type Mallard = Mallard of int type Bird = Goose | Cardinal | Mallard let x = Goose 7
This code fails. Goose in the Bird DU shadows Goose as a type and turns it into an Atom. This shadowing happens silently and at least to me is surprising.
type Goose = Goose of int type Cardinal = Cardinal of int type Mallard = Mallard of int type Bird = Goose of Goose | Cardinal of Cardinal | Mallard of Mallard let x = Goose 7
The type shadowing here still means I can't move forward, because there's no way to make a Goose....
type Goose = Goose of int type Cardinal = Cardinal of int type Mallard = Mallard of int type Bird = Goose' of Goose | Cardinal' of Cardinal | Mallard' of Mallard let x = Goose' (Goose 7)
This works. This kind of situation happens where someone created a single case DU, and it gets consumed by someone who can't muck with the original DU for fear of breaking existing code.
This code would be particulary useful for reusing existing cases and avoiding to nest it.
@dsyme,although the discussion has been dormant a bit in this topic, I think this is still a very valuable addition to the language.
Perhaps we could move this forward and mark it approved in principle, so that we can start working it out into an RFC? I'd be willing to put in the effort for that.
Some additional motivation: https://www.reddit.com/r/programming/comments/e8wfsr/discriminated_unions_in_c_an_unexceptional_love/fagb0hv/
such types are erased to
object
This makes sense if there are value types in the set of types, but if there are reference types only, it makes more sense to me to erase to the most-derived common base class. For instance, given the following:
type A() = class end
type B() = inherit A()
type C() = inherit B()
type D() = inherit A()
I'd expect (B | C)
to be exactly equivalent to a declaration of type B
, and (B | C | D)
to erase to A
(but not include A
). However, in this case, I'd also expect all the members of A
to be available on this value without needing to cast or pattern match.
introducing such a union value would need to be either explicit (e.g. using some new operator like
Typed
) or type-directed or both
This Typed
operator feels a little cumbersome to me. Out of curiosity, why not allow these new anonymous union types to be inferred directly where possible? To adapt the original example:
let generateValue0 () = if Monday then 2 else "4"
This function would have the inferred type unit -> (int | string)
This code would have been illegal before, so I don't think this would be a breaking change to the language. I could see the argument being made that this would make debugging harder if you had actually intended both branches of the if
to return the same type, but you can always add a type annotation to enforce that
Going back to @voronoipotato's example, this would be amazing to be able to do:
type Goose = Goose of int
type Cardinal = Cardinal of int
type Mallard = Mallard of int
// a type abbreviation for an erased anonymous union
// (note the parenthesis to disambiguate from creating a new nominal union type)
type Bird = (Goose | Cardinal | Mallard)
// inferred to have type (Goose | Cardinal | Mallard) -> string
let talk = function
| Goose _ -> "honk"
| Cardinal _ -> "tweet"
| Mallard _ -> "quack"
Another note: in the CIL metadata, we might be able to enforce type safety and still expose something reasonable to other languages by emitting something more or less equivalent to this pseudo C#:
// Overloads for other .NET languages, ignored by F#
//. We'd only need to emit these for public symbols
public static string talk (Goose arg) => talk ((object)arg);
public static string talk (Cardinal arg) => talk ((object)arg);
public static string talk (Mallard arg) => talk ((object)arg);
// This is the one F# would call
// I believe adding the `modreq` to the argument type would prevent this overload
// from being used by other .NET languages
public static string talk (
[ErasedUnion (new[] { typeof (Goose), typeof (Cardinal), typeof (Mallard) })]
object modreq([FSharp.Core]ErasedUnion) arg)
{
// actual implementation here
}
Was going to add a suggestion but found this thread.
Currently working with Akka and Akkling. It would be great if we are able to restrict the type of Object
we consume using purely types.
Another use case is when working with actor frameworks, you would want 2 parents actors in a hierarchy be able to handle messages of the same type, but still be able to handle other messages which are of non-intersecting types, when the child actor sends them.
module Protocol =
type Person = ...
type Animal = ...
type AllMessages = (Person | Animal | string | int)
module Parent1 =
type Msg = (Person | string)
module Parent2 =
type Msg = (Person | Animal)
module Child =
open Protocol
type Msg = (string | int)
type ParentMsg = Person
let child (mailbox: Actor<Msg, ParentMsg>) (msg: ChildMsg) =
// can send a message of `Person` without being aware of the sender/parent possible types
// parent IActorRef could either be Parent1 or Parent2
mailbox.Sender().Tell(new Person ())
As F#ers, we prefer strong typing and encoding domain information as much as possible into the type system. However there is no escape if we need to use the massive amount of ecosystem, out there written in C#, as such we could have a type level _
/ignore pattern.
We could define a union such as
type Msg = (Person | string | _ )
Here, when we consume this type, we expect Person
or string
but could also be anything we don't expect or want to handle.
This is erased, and also enforces pattern matching, this would give first class support for F# working with most C# libraries using such pattern with minimal changes.
type Msg = (Person | string | _ )
let handleMsg: Msg -> unit = function
(* Compiler forces you to match on Person and string
and will warn if _ isn't handled explicitly *)
| Person { name = name } -> printfn "Name: %s" name
| str -> printfn "Received string: %s" str
| _ -> unhandled ()
let object: System.Object = somecsharpFuncReturnObject ()
object |> handleMsg // implicitly converted to Msg and all Msg types are subset of System.Object
Handling special Akka.net messages would be simple as just defining message for the handling actor without resorting to System.Object
type RootActorMsg = (Akka.Actor.ReceiveTimeout | string | _ )
I think this draft PR is an attempt at addressing this: https://github.com/dotnet/fsharp/pull/8927
@abelbraaksma I don't think this is really a feature covered here. Type unions are powerful tool in the hands of a programmer thanks to the laws they offer (at least in languages implementing them like Scala 3, Ceylon or TypeScript):
- They are associative →
(A | B) | C
is the same asA | (B | C)
- They are commutative →
A | B
is the same asB | A
- They are idempotent →
A | A
andA
are the same.
If we extend that over the concept of subtyping:
- If we have classes
Animal
andCat :> Animal
→Cat | Animal
is the same asAnimal
. - Given that F# has bottom type (which is not quite true, but let's say that
obj
could work in the actual implementation →A | obj
andobj
are the same. - Given that F# would have subtype of every other type (which is not true) →
A | Nothing
would always be Nothing.
The PR which you linked doesn't quite fit into any of these laws, it doesn't even fit into the title of this issue (as it's a syntax suggar over choice, which holds type tag and is not erased).
@Horusiath, thanks, that was very insightful. I wasn't aware of these approaches not overlapping, but now I know I should probably experiment a bit more with languages that do have this feature to understand it better.
```fsharp let generateValue0 () = if Monday then 2 else "4"
This function would have the inferred type unit -> (int | string) This code would have been illegal before, so I don't think this would be a breaking change to the language.
For me, the above inference suggestion is a deal-breaker. It turns a common error with 'if' into a nest of knock-on issues as you try and work out why you're suddenly passing an "(A | B)" where a "B" is required.
I like to think of language feature design from two perspectives: The initial coding, and then the re-visitation under the stress of refactoring. For the latter, F# has excellent provision through the interplay of all sorts of little features, and is a large part of the reason why I'm here.
2c/ I could support adding a shorthand type-declaration syntax that de-sugars to a DU (like the example @dsyme posted earlier). Issues would be: To support generating the case constructor names, could such a DU only ever comprise of named types? And what would the prefix for the case-constructors be?
let generateValue0 () = if Monday then 2 else "4"
This function would have the inferred type unit -> (int | string) This code would have been illegal before, so I don't think this would be a breaking change to the language.
For me, the above inference suggestion is a deal-breaker. It turns a common error with 'if' into a nest of knock-on issues as you try and work out why you're suddenly passing an "(A | B)" where a "B" is required.
You could potentially enforce specifying type signatures for union types. However I feel, the same could be said for simple types that currently exists. You accidentally return something string
instead of int
and you have issues in other places. You would ideally restrict the types you consume at a boundary point (perhaps, at the module level or project level)