hhvm icon indicating copy to clipboard operation
hhvm copied to clipboard

Union Types

Open azjezz opened this issue 5 years ago • 12 comments

Proposal

I would like to propose adding union types into hack, once again.

this request proposes adding the ability to declare a type alias using multiple types, that can be used for both function parameter, and return types.

type numeric = float|int|Number|BigNumber;

Use case

i have already stated how union types can be used safely for parameter, so i decided to also mention a case where i found union types to be really useful, and provides a better DX.

the Http Controller / Handler case :

forcing the handler to return an instance of response is not a good DX.

when the response can be just a redirect, the developer would be forced to generate a Uri instance from the route generator, create a response, add the Uri to the response header but after converting it to a string, and create an empty body stream to set as the response body, when ideally they can just return a Uri instance from the generator and the application kernel can take care of the rest, this however would require Nuxed to set the return type to mixed, but this is not ideal since mixed is too wide ( i.e i don't want the handler to return me a resource or an object that i don't know how to deal with. )

using union types, the return type would be :

namespace Nuxed\Http;

use namespace Nuxed\{Util, Filesystem};

type Response = 
  Message\Response | // a response
  Message\Uri | // redirect
  Message\Stream | // body, treat it as a plain text response
  Server\IHandler |  // another handler to handle the request
  string | // plain text
  \XHPRoot | // html
  Util\Stringable | // plain text
  FIlesystem\Node | // download
  KeyedContainer<string, mixed>; // json objects

using the above type, the application kernel, would be able to generate the response and developers would not be forced to construct a response object and set the headers for some responses such html, downloads, json ( default headers can change in the config files ).

however, this is another use case for union types. that won't be solved using method overloading.

Syntax

Since hack already have type aliases, and using union types in declaration header can get messy, i suggest that union types can only be declared using type aliases as shown above.

type arraykey = string|int;
type num = int|float;

type Json = num|string|KeyedContainer<string, mixed>|Container<mixed>|null|bool;

errors, limitations, covariance/contravariance

<<__Sealed(B::class, C::class)>>
interface A {}

final class B implements A {}
abstract class C implements A {}
class CImpl extends C {}

class D {}

type foo = A|B; // type check error, B already includes A 

class E {
  const type MyUnion = A|D;
  const type RetUnion = B|C;
  
  public function foo(this::MyUnion $union): void {}

  public function bar(C $c): void {}
  
  public function baz(): this::RetUnion { return new B(); }
  public function qux(): this::RetUnion { return $this->baz(); }
  public function lex(): this::RetUnion { return $this->baz(); }
  public function herp(): this::RetUnion { return new B(); }
}

class F extends E {
  // type error, doesn't accept D anymore, still accepts all A implementations ( note, A is sealed to only B and C )
  public function foo(this::RetUnion $un): void {}

  public function bar(this::MyUnion $c): void {} // okay, still accepts C

  // this would fail in php, but not in hack since A is sealed to only C and B,
  // meaning this function can only return C or B, unless, A is not sealed, or allows other
  // classes to implement it.
  public function baz(): A { return new C(); }
  // okay
  public function qux(): B { return new B(); }
  public function lex(): C {
    // type error, parent returns C|B, this function should return only C
    return parent::lex();
    // okay
    return parent::lex() as C; 
    // okay
    $c = parent::lex();
    return $c is C ? $c : new CImpl();
  }
  // type error
  public function herp(): D { return new D(); }
}

class G extends E { 
  // okay, still accepts A and D
  public function foo(mixed $un): void {}
}

class H extends E {
  // okay, A and D are not null
  public function foo(nonnull $un): void {}
}

class Herp extends E {
  // error, doesn't accept D anymore
  public function foo(A $un): void {}
}

class Berp extends E {
  // error, doesn't accept A implementations anymore 
  public function foo(D $un): void {}
}

class Foo extends E {
  const type FooType = B|C|D;
 
  // okay, accepts D and all A implementations ( note : A is sealed to only B and C )
  public function foo(this::FooType $un): void {}
}

References :

  • https://wiki.php.net/rfc/union_types
  • https://en.wikipedia.org/wiki/Union_type
  • https://github.com/facebook/hhvm/issues/7131
  • https://crystal-lang.org/reference/syntax_and_semantics/union_types.html
  • https://www.typescriptlang.org/docs/handbook/advanced-types.html#union-types
  • https://doc.rust-lang.org/reference/items/unions.html

azjezz avatar Sep 08 '19 21:09 azjezz

recentish feedback on union types from @dlreeves

  • What if someone makes a union of 100, 1000 or even 10000 types? How can we prevent this since it would absolutely take forever to type check
  • How would these types be enforced in the runtime? Should they be?
  • How would this interplay with features like reified generics? How will we represent this type at runtime in things like type structures?
  • Should these unions be tagged vs. untagged? Which actually solves the issue?
  • How would this work with a potential pattern matching feature? Would it support exhaustive checks on pattern matching and are there design decisions we should think about here?
  • Would their presence push people to make worst design decisions? Instead of introducing an interface it would be easier to use a union type, but this is worst for our dev tools and our runtime.
  • What about intersection types? We don't have those represented in our type system yet but we are exploring that. How would they interact? Would introducing unions now make it harder to add other useful types in the future

It is a frequently requested feature, not impossible, but definitely not as straightforward as it may seem.

fredemmott avatar Sep 09 '19 15:09 fredemmott

Thanks for the reply @fredemmott, I am fully aware that adding union is not straightforward and we should take time to discuss the points you made and maybe other possibilities, and opinions.


What if someone makes a union of 100, 1000 or even 10000 types? How can we prevent this since it would absolutely take forever to type check

What if someone makes a function with 100, 1000, or even 10000 generic types ? i believe this is the same case here. you won't expect it to be blazing fast if you are using a union of 1000 types :)

HHVM can also do some optimizations here.

e.g :

type nonobject = null|string|num|bool|resource|dict<arraykey, mixed>|vec<mixed>| ... all other non-object types;

instead of checking if $x is on of the types in the union, maybe hhvm can just throw if its an object.


How would these types be enforced in the runtime? Should they be?

preferably :) but having type-check only is a great start :)


How would this interplay with features like reified generics? How will we represent this type at runtime in things like type structures?

i don't fully understand what you mean here, hack already have builtin unions ( such as arraykey, mixed, nonnull, num ) that work perfectly with reified generics.

type my_array_key = string|int;

class Foo<reify T as my_array_key> { }
// same current behaviour as 
class Bar<reify T as arraykey> { } 

Should these unions be tagged vs. untagged? Which actually solves the issue?

i personally would prefer to have tagged union such in TypeScript, but this is up to discussion i guess.

How would this work with a potential pattern matching feature? Would it support exhaustive checks on pattern matching and are there design decisions we should think about here?

the only thing i would thing of is type_structure where kind is usually an int :

namespace Foo;

type Bar = nonnull;

var_dump(type_structure(Bar::class));

// result :
array(2) {
  ["kind"]=>
  int(23)
  ["alias"]=>
  string(7) "Foo\Bar"
}

for union types, i believe kind should be int|vec<int> either the type kind or a vector of type kinds if its a union.

namespace Foo;

type Bar = string|resource;

var_dump(type_structure(Bar::class));

// result :
array(2) {
  ["kind"]=> vec(2) {
    [0] => int(4)
    [1] => int(5)
  }
  ["alias"]=>
  string(7) "Foo\Bar"
}

this should not introduce any BC breaks so its safe. built-in in unions should still have their own kind instead of vec<otherkinds>

and also reflections, i would suggestion adding isUnion(): bool and getUnionTypes(): vec<TypeReflection> to TypeReflection and not bother much with it since the API is going to change in the future.


Would their presence push people to make worst design decisions? Instead of introducing an interface it would be easier to use a union type, but this is worst for our dev tools and our runtime.

people will make bad design decisions regardless of union types presence. they can just use mixed everywhere and $x as Foo when they like to. The documentation should encourage using interfaces where possible. but i don't think people will jump on unions and drop interfaces, specially for libraries / frameworks as you can't have a union with all the possible implementations for a specific component.


What about intersection types? We don't have those represented in our type system yet but we are exploring that. How would they interact? Would introducing unions now make it harder to add other useful types in the future

i don't think introducing union types will make it any hard to introduce feature types in hack, and i can't really see why i would make it any harder.

about intersection, i believe it would work perfectly with union, e.g :

namespace SomeLib {
	interface A {
	  public function foo(): int;
	}
	interface B {
	  public function baz(int $_): void;
	}
}
namespace SomeOtherLib { 
	interface C {
	  public function foo(): int;
	  public function baz(int $_): void;
	}
}
namespace MyApp {
   use namespace SomeOtherLib;
   use namespace SomeLib;
              //  ( a and b ) or c
   type Foo = (SomeLib\A & SomeLib\B) | SomeOtherLib\C;
   function baz(Foo $foo): void {
     // type safe
     $foo->baz($foo->foo());
   }
}

azjezz avatar Sep 09 '19 21:09 azjezz

note : some cases should not be handled at runtime, such as :

interface A {}

final class B implements A {}

type foo = A|B; // type check error, A already includes B

^ this should be a type-check error only, since hhvm would have to load the classes for it to figure this out.

azjezz avatar Sep 09 '19 22:09 azjezz

The only thing I'll add here is this is one of our most requested features, so we will take a serious look at this at some point. I still do not feel now is the right time given some other fundamental aspects of the language we are sorting out.

dlreeves avatar Sep 10 '19 18:09 dlreeves

This definitely something that needs careful design. The feedback from the Flow team (which does have unions) is that the design does not steer users towards good APIs. It's too easy to accept a bunch of stuff rather than a nice API. You see this in the types of some of the PHP\foo functions, where the runtime actually accepts more types than the typechecker believes.

We'd want some sort of restriction to help users write great APIs. Maybe limiting it to type aliases is an option we could look at.

We've also had some issues with the existing built-in union types. The type checker doesn't help you with code like this:

function compare_them(num $x, float $y): bool {
  return $x === $y;
}

function oops(): bool {
  return compare_them(4 / 2, 2.0);
}

Wilfred avatar Sep 11 '19 14:09 Wilfred

A few other clarifications:

Hack does support unions internally in the typechecker today. It just doesn't support denotable unions, so you can't write union types in code (e.g. function foo(A|B $x): C|D).

interface A {}

final class B implements A {}

type foo = A|B; // type check error, A already includes B

We consider this to be fine today: A|B simplifies to A here. We use unions when handling simple cases like this:

class MyParent {}
final class MyChild extends MyParent {}

function return_myparent(bool $b): MyParent {
  if ($b) {
    return new MyChild();
  } else {
    return new MyParent();
  }
}

Wilfred avatar Sep 11 '19 14:09 Wilfred

update : HHVM 4.26.0

an experimental new syntax was added for union types (Cat | Dog) and intersection types (FourLegged & Mammal) – this is an early experimental prototype not meant for general use (it is entirely possible that the prototype will never make it to a final release, depending on, for example, how it affects typechecking performance in various scenarios), but if you want to experiment with it, add union_intersection_type_hints=true to your .hhconfig

https://hhvm.com/blog/2019/10/09/hhvm-4.26.0.html

azjezz avatar Nov 17 '19 19:11 azjezz

Is it still in experimental stage?

klesun avatar May 06 '21 18:05 klesun

Yes; unions are pretty unlikely to be supported, but intersection types are more promising. This is mostly due to the likelyhood of widespread use of union types leading to performance problems without a clear way to address them.

fredemmott avatar May 06 '21 18:05 fredemmott

likelyhood of widespread use of union types leading to performance problems without a clear way to address them.

has any other language suffered from this? PHP introduced union recently ( 8.0 ), and so far i haven't seen anyone miss use them in that way.

people "wont" create a type with 1000 unions, just like people won't create a function with 1000 arguments.

azjezz avatar May 06 '21 21:05 azjezz

has any other language suffered from this?

This is a significant problem for Flow, which shares a lot with Hack, and is a smaller amount of code (at FB)

people "wont" create a type with 1000 unions, just like people won't create a function with 1000 arguments.

A more likely example is creating a union of several interfaces, which in turn contain some unions, which could easily include a few thousand concrete types

fredemmott avatar May 06 '21 22:05 fredemmott

This is a significant problem for Flow, which shares a lot with Hack, and is a smaller amount of code (at FB)

If this effects FB code itself, i think the right thing to do is enforce not using unions or calculating the complexity of a union using a static analysis tool and forbid declaring a union with high complexity.

This feature is extremely useful, i have worked in multiple code base over the past few months were unions were used and declaring a union that is really complex to the point it effects performance has never been an issue.

I don't think this feature should be removed from Hack just because people at facebook[.]com might misuse it.

A more likely example is creating a union of several interfaces, which in turn contain some unions, which could easily include a few thousand concrete types

I don't really understand what you mean by "interfaces, which in turn contain some unions", but given

interface A {}
interface B {}
interface C extends A, B {}
interface D extends C {}
interface E {}
interface F {}
interface G extends E, F {}

and type alias union:

type Foo = (G | ( E & F )) | ( D | C | ( A & B )); 

Foo can be normalized to (E&F)|(A&B), and i don't really see people doing stuff more complicated than this.

azjezz avatar May 09 '21 20:05 azjezz

Refs: Experimental feature case types

lexidor avatar Jun 23 '23 19:06 lexidor