mlscript icon indicating copy to clipboard operation
mlscript copied to clipboard

Support for GADTs

Open Meowcolm024 opened this issue 1 year ago • 6 comments
trafficstars

Add support for GADTs

  • GADT-reasoning in pattern matching (type annotation needed)
  • Typing for type selections
  • Parsing the wildcard type ? in type arguments
  • use as keyword for upcast (same as :)
  • test cases GADT[1-6].mls in gadt directory, and AsOp.mls
  • refactored wildcard type arguments
  • correct extruded type args

Some TODOs (added by @LPTK):

  • [x] Merge branch local-gadt-wip after we figure out the new errors
  • [ ] Merge my local changes to simplify away compositions with ??X types into main branch before merging this PR

The PR has become way too big and it's difficult to keep track of what changes cause what results or regressions. Let's split it up and merge each of these things separately:

  • [ ] Typing for type selections
  • [ ] Parsing the wildcard type ? in type arguments, use as keyword for upcast (same as :)
  • [ ] Refactored wildcard type arguments, correct extruded type args
  • [ ] Rethink the way GADT reasoning is triggered: only introduce skolems when they correspond to user-annotated pattern variable type arguments (as in App[f, a]); and in this case don't try to refine these type arguments in the scrutinee.

Meowcolm024 avatar Nov 28 '23 06:11 Meowcolm024

Cool! Will try to take a look later. In the meantime, plz read this again and make sure all the changes are clean: https://github.com/hkust-taco/mlscript/blob/mlscript/CONTRIBUTING.md

LPTK avatar Nov 28 '23 09:11 LPTK

Known issue: recursion depth exceeded (infinite extrusion loop) when typing certain terms (see GADT3.mls and GADT5.mls etc) Observation: it seems that the loop follows a pattern of: a''# & S <: b'

Meowcolm024 avatar Dec 27 '23 13:12 Meowcolm024

So, we need to fix the problems in https://github.com/Meowcolm024/mlscript/tree/local-gadt-wip

Looking at one example: the error you get here is correct:

abstract class Vec[type L, type T]: Nil[T] | Cons[S, T]
class Nil[T] extends Vec[Z, T]
class Cons[L, T](val h: T, val t: Vec[L, T]) extends Vec[S[L], T]
//│ ╔══[ERROR] Type mismatch in type declaration:
//│ ║  l.18: 	class Cons[L, T](val h: T, val t: Vec[L, T]) extends Vec[S[L], T]
//│ ║        	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
//│ ╟── type `L` is not an instance of type `S`
//│ ║  l.18: 	class Cons[L, T](val h: T, val t: Vec[L, T]) extends Vec[S[L], T]
//│ ║        	           ^
//│ ╟── Note: constraint arises from class tag:
//│ ║  l.16: 	abstract class Vec[type L, type T]: Nil[T] | Cons[S, T]
//│ ╙──      	                      

Cons[L, T] is indeed not a Cons[S, T] because L is not the same type as S.

What you really want to write here is : Nil[T] | Cons[?, T]. So it seems you need to add support for wildcards.

On the other hand, I think : Nil | Cons should in principle work, but it currently does not (the map function yields an error) and we could try to find out why.

EDIT to add: For the record, the reason map doesn't work with the : Nil | Cons self-annotation is that we infer scrutinee type #Nil | #Cons & { Cons#L = ?L0..?L1; Cons#T = ?T0..?T1 } with ?T0 <: ?T1 <: ‘A | ~⊥(??T), which leads to derived constraint Vec[A, B] <: {Cons#T}, which yields an extruded ??T for the Cons#T field, from the Cons raw type part of the annotation. Note: the ⊥(??T) is really weird; feels like it should be ⊤(??T) – perhaps a pretty-printing issue — EDIT actually no, a plain old bug https://github.com/hkust-taco/mlscript/commit/abe5e62d891fb3844d96c87135fff64fa1ecc19f Note: at least in situations like these, we could maybe see that the Cons#T refinement in this case could be turned into an equivalent Vec#T refinement?

LPTK avatar Jan 31 '24 17:01 LPTK

See: https://github.com/LPTK/mlscript/commit/2f193c87bbe7d70c9da1d4cb793ee6471a091416

The goal is to allow:

abstract class Vec[type L, type T]: Nil[T] | Cons[?, T]
class Nil[T] extends Vec[Z, T]
class Cons[L, T](val h: T, val t: Vec[L, T]) extends Vec[S[L], T]

And to elaborate on the previous message, this is what happens if we use : Nil | Cons:

  • we get constraint Vec['L, 'A] <: #Nil | #Cons & { Cons#T: ..'A } for the scrutinee
  • ie #Vec & { L = 'L; T = 'A } & (#Nil | #Cons) <: #Nil | #Cons & { Cons#T: ..'A }
  • leading to constraint #Cons & { L = 'L; T = 'A } <: #Cons & { Cons#T: ..'A }
  • leading to looking up Cons#T in #Cons, which yields an extruded skolem not compatible with 'A

LPTK avatar Feb 02 '24 01:02 LPTK

So the way type selections/projections are typed is currently broken, as it falls foul of the very same problem they had in Sala. Here's an example:

class Wrap[type A]
class Test[type B]
//│ class Wrap[A] {
//│   constructor()
//│ }
//│ class Test[B] {
//│   constructor()
//│ }

fun oops(w: Wrap[Test[Int] & Test[Str]]) = 1 : w.A.B
//│ fun oops: (w: Wrap[in Test[out Int | Str] out Test[in Int | Str out nothing]]) -> nothing

oops(new Wrap)
//│ nothing
//│ res
//│     = 1

And here's a Scala 3 issue that's similar: https://github.com/lampepfl/dotty/issues/15578

The root of the issue is being able to use an abstract type that has bad bounds as though it really existed (had a valid instance), whereas we have no evidence that it does. Indeed, the only guarantee we have is that the types of values in the context (and their val fields) have good bounds. Types nested in types in the context have no such guarantees.

This means the Vector-tail example is broken. We could solve this by adding a way of doing inversion reasoning, or by some other technique.

LPTK avatar Feb 03 '24 18:02 LPTK

So regarding https://github.com/hkust-taco/mlscript/pull/195#pullrequestreview-1805379488, I minimized the regression to:

type List[out A] = Cons[A]
module Nil
class Cons[A](head: A, tail: List[A])
//│ type List[A] = Cons[A]
//│ module Nil
//│ class Cons[A](head: A, tail: List[A])

fun listConcat(xs) =
  if xs is
    Cons(x, xs2) then Cons(x, listConcat(xs2))
//│ fun listConcat: forall 'A 'A0 'A1. Cons['A & ('A0 | ~??A)] -> Cons[in 'A0 out 'A1]
//│   where
//│     'A0 :> 'A1 | ??A & 'A
//│     'A1 :> 'A0 | ??A & 'A

Previously, we didn't have these duplicated bounds.

I confirmed that they are simply due to extruding things several times. When we either cache the extrusion context or don't increase the level of patmat branches, then we get something like the original inferred type back.

Caching the extrusion context:

//│ fun listConcat: forall 'A 'A0. Cons['A & ('A0 | ~??A)] -> Cons['A0]
//│   where
//│     'A0 :> ??A & 'A

Not increasing the level of patmat branches:

//│ fun listConcat: forall 'A 'A0. Cons['A] -> Cons['A0]
//│   where
//│     'A0 :> ??A & 'A

So we should cache the extrusion context. But to do this properly, we should only cache extrusions to the same level, instead of having one single cache that's used for extrusions to any level, as is currently implemented.

LPTK avatar Feb 17 '24 14:02 LPTK

Thanks again for the great work, @Meowcolm024! These changes will be useful for future reference, even though they are already obsolete (since we are moving to a new type checker implementation). So I will close this for now as it is not worth fixing the conflicts and cleaning up the history, similarly to https://github.com/hkust-taco/mlscript/pull/219

By the way, as I told you in person some time ago, you merged some commits on my WIP branch (the "W"-named commits) which were only temporary experimentation changes. These were not intended to be made part of your PR – or at least, not as is.

LPTK avatar Jan 21 '25 03:01 LPTK