dhall-lang icon indicating copy to clipboard operation
dhall-lang copied to clipboard

A minimalistic proposal for adding row and column polymorphism to Dhall

Open winitzki opened this issue 1 year ago • 4 comments

  1. "Row polymorphism" means that a record type { a : Text, b : Bool } is a subtype of { a : Text}. Any function taking a record of type { a : Text } can be directly applied to an argument of type { a : Text, b : Bool } and will just ignore the field b.

For example, a function f : { a : Text } → T could be applied as f r where r = { a = "x", b = True }. In the current Dhall syntax, this would be the same as if we silently rewrote f r as f r.{a}.

I am proposing to implement that by changing the typechecking rules so that functions can be applied to records containing extraneous fields. We will not introduce any extra syntax or extra type variables to denote row polymorphism.

In https://github.com/dhall-lang/dhall-lang/blob/master/standard/type-inference.md#functions we will change:

The type system ensures that function application is well-typed, meaning that the input type that a function expects matches the inferred type of the function's argument:

by adding something like this:

"If a function's argument type is a subtype of the function's input type, the application is still well-typed."

Then we need to define a subtyping judgment T <: U and replace the judgment for the function application by:

Γ ⊢ f : ∀(x : A₀) → B₀
Γ ⊢ a₀ : A₁
A₁ <: A₀   ; This line was A₀ ≡ A₁.
↑(1, x, 0, a₀) = a₁
B₀[x ≔ a₁] = B₁
↑(-1, x, 0, B₁) = B₂
B₂ ⇥ B₃
──────────────────────
Γ ⊢ f a₀ : B₃

At the minimum, the subtyping judgment T <: U will be just the same as T ≡ U except when T is a record type that has more fields than a record type U.

Later we could expand the subtyping judgment for other types.

However, the subtyping judgment will remain within the typechecker and will not have any user-facing syntax. One will not be able to declare Dhall functions with type parameters constrained to be a subtype of something.

This change will make all records automatically polymorphic. To define a function that takes a record with exactly the given fields and no other fields, one can use assertions: e.g., require an evidence argument of type r.(R) === r saying that the record r remains unchanged when projected to the record type R.

The beta-normalization will then work without any change, as far as I understand.

  1. "Column polymorphism" means that the type < A : Text | B : Bool > is a subtype of union types such as < A : Text | B : Bool | C : Double > and of any other type that has more alternatives. A function that takes arguments of type < A : Text | B : Bool | C : Double > could be then applied directly to an argument of type < A : Text | B : Bool >.

I am proposing to implement that by allowing a union type < A : Text | B : Bool > to be in a merge expression that has at least the handler fields for A, B, and possibly other handler fields. The extra handler fields will be just ignored (currently the presence of any extra handler fields will raise a type error). We will not introduce any extra syntax or extra type variables to denote column polymorphism.

So, I'm proposing just to change the typechecking rules to allow that code to pass type-checking. The beta-normalization will then work without any change, as far as I understand.

We change here: https://github.com/dhall-lang/dhall-lang/blob/master/standard/type-inference.md#merge-expressions

A merge expression is well-typed if there is a one-to-one correspondence between the fields of the handler record and the alternatives of the union:

Γ ⊢ t : {}   Γ ⊢ u : <>   Γ ⊢ T : Type
──────────────────────────────────────
Γ ⊢ (merge t u : T) : T

And instead we write something like this:

A merge expression is well-typed if each alternative of the union has a corresponding field in the handler record. Unused handler fields are ignored.

Γ ⊢ t : { ts… }   Γ ⊢ u : <>   Γ ⊢ T : Type
──────────────────────────────────────
Γ ⊢ (merge t u : T) : T
  1. I noticed that record types have a narrowing operation r.(S) where r : R and S can be a record supertype of R. For example, R = { a : Text, b : Bool } and S = { a : Text }. But union types do not have a corresponding operation.

Union types by duality should have a "widening" operation: u.(V) where u : U and V can be a union supertype of U. For example, U = < A | B > and V = <A | B | C>. Then u.(V) should become a value of type V. For example, < A | B >.A .(V) should become < A | B | C>.A.

Would it make sense to introduce such an operation?

winitzki avatar Jun 25 '24 20:06 winitzki

To summarize, I am proposing that the typing judgment :

x : { a : Text, b : Bool }

should be interpreted as:

"the value x must be a record with at least the fields a : Text, b : Bool; it may have other fields as well"

The typing judgment:

u : < A | B : Bool>

should be interpreted as:

"the value u must have a union type with no alternatives other than A and B : Bool; it may also have a union type with fewer alternatives"

Then the narrowing operation x.(R) and the widening operation u.(U) become syntactic sugar: they can be implemented for any specific record types R and union types U.

let R = { a : Text, b : Bool }
let narrowR = λ(x : R) → { a = x.a, b = x.b }  -- Same as x.(R)
let U = < A | B : Bool >
let wideU = λ(u : U) → merge { A = U.A, B = λ(b : Bool) → U.B b } u -- Same as the proposed u.(U)

These functions are identity functions in today's Dhall, but if row/column polymorphism is added, these functions will actually perform narrowing and widening, because narrowR will be applicable to records with more fields than just a, b, and wideU will be applicable to unions with fewer constructors than A, B.

Some more examples:

Row polymorphism

Any value x : R having a record type R = { a : Text, b : Bool } is always automatically considered to also have type S = { a : Text } in any typechecking judgment.

The narrowing operation x.(S) produces a record with fewer fields. The assertion x === x.(R) holds only when x has exactly the fields specified by R.

let R = { a : Text, b : Bool }
let S = { a : Text }
let x : R = { a = "abc", b = True }
let _ = assert : x.(R) === x  -- verify that `x` has exactly the fields from `R` and no others
-- assert : x.(S) === x will fail
let _ = assert : x.(S) === { a = "abc" }
let _ = x : S  -- proposal: accept this type annotation

-- proposal: this function should be well-typed and should not truncate `s` to type `S`:
let f : S → S = λ(s : S) → s // { a = "xyz" }
let _ = assert : f x === { a = "xyz", b = True }

-- proposal: this function should truncate `s` to `S`:
let g : S → S = λ(s : S) → { a = s.a } -- This is not an identity function!
let _ = assert : g x === { a = "xyz" }

In this way, we can write functions that either have narrowing behavior or do not have it.

A function that extracts the field b : Bool from any record that has (at least) that field:

let extract_b_Bool
   : { b : Bool) → Bool
   = λ(r : { b : Bool }) → r.b

Column polymorphism

Any value u : U where U = < A | B : Bool> is always automatically considered to also have type V = < A | B : Bool | C : Text > in any typechecking judgment.

The widening operation u.(V) produces a value of type V. The assertion u === u.(U) holds only when u has been constructed from U.

let U = < A | B : Bool>
let V = < A | B : Bool | C : Text >
let u : U = U.B True
let _ = assert : u.(V) === V.B Bool  -- proposal: widening operation
let _ = u : V  -- proposal: accept this type annotation

let _ = assert : u.(U) === u  -- verify that `u` was made from `U`'s constructors
-- assert : u.(V) === u will fail

-- proposal: this function should be well-typed and should not widen `u` to type `V`:
let f : V → V = λ(v : V) → merge { A = (U.A : V), B = λ(b : Bool) → (U.B False : V), C = λ(c : Text) → V.C "xyz" } v
let _ = assert : f u === U.B False

-- proposal: this function should widen `u` to `V`:
let g : V → V = λ(v : V) → merge { A = V.A, B = λ(b : Bool) → V.B b, C = λ(c : Text) → V.C "xyz" } v
let _ = assert : g u === V.B True

In this way, we can write functions that either have widening behavior or do not have it.

Note that U.B False : V does not transform the value into V.B False; the value remains U.B False but it accepted as being of type V for typechecking purposes.

A function < B : Bool >.B creates a value of type < B : Bool > or of any other union type that has at least that constructor:

let x = < B : Bool >.B True
let _ = x : < B : Bool | C | D | E >  -- All this is well-typed.

winitzki avatar Jul 02 '24 09:07 winitzki

We may also consider extending the typing judgments for merge and if/then/else to the case when different branches may return different subtypes of the same result type.

-- This could become well-typed as `h: Bool → {a : Natural}`:
let h = λ(b : Bool) → if b then { a = 1 , b = "xyz" } else { a = 2, c = False }

-- This could become well-typed as `k : < A | B : Bool > → < C : Text | D : Bool >`:
let k = λ(u : < A | B : Bool >) → merge {
      A = < C : Text >.C "empty"
    , B = λ(x : Bool) → < D : Bool>.D x
 } u

However, I feel that this change introduces brittleness and unnecessary complexity. If we can avoid this, we should. This kind of excessive type inference gives errors in my day job in Scala, where the compiler dutifully infers the greatest lower bound of all subtypes and it's far from what was intended.

Instead, I am proposing to use explicit type annotations when needed. Instead of computing the GLB or LUB of subtypes, the typechecker will simply accept x : X in certain places and go on.

For example, we would then write:

-- This could become well-typed as `h: Bool → {a : Natural}`:
let T = {a : Natural}
let h = λ(b : Bool) → if b then { a = 1 , b = "xyz" } : T else { a = 2, c = False } : T

-- This could become well-typed as `k : < A | B : Bool > → < C : Text | D : Bool >`:
let V = < C : Text | D : Bool >
let k = λ(u : < A | B : Bool >) → merge {
     A = < C : Text >.C "empty" : V
   , B = λ(x : Bool) → < D : Bool>.D x : V
 } u

The user is required to put in some more type annotations where subtyping is to be expected. Perhaps this is a good thing: subtyping should not be magical, especially in complicated cases.

An example of a complicated case is a combination of row and column polymorphism with function covariance/contravariance. Consider the following types:

let A = { t : Text }
let B = { t : Text, b : Bool } -- B is a subtype of A.
let C = < C : Bool >
let D = < C : Bool | D >  -- C is a subtype of D.
let h : { a : A, d : D } → Bool = λ(r : { a : A, d : D }) → merge { C = λ(b : Bool) → b, D = True } r.d
let _ = assert : True === h { a = { t = "xyz" }, d = D.D }
-- But we can also apply h to a subtype of { a : A, d : D }:
let x : { a : B, d : C } = { a = { t = "a", b = True }, d = D.C False }
-- Should we accept `h x` as well-typed?
let result = h x
let _ = assert : result === False

Accepting h x as well-typed would mean a typing judgment that { a : B, d : C } is a subtype of { a : A, d : D } and therefore all types match in h x. This kind of judgment will need to be implemented also for function types; for example, we need to recognize that the function type { a : A, d : D } → Bool is a subtype of the function type { a : B, d : C } → Bool. We would like to avoid complicating the typechecker.

Also, this kind of typing judgment can be confusing to users if performed automatically. Some function applications will be silently accepted even though they may not make sense to the users. In this respect, it is better in my opinion if the user writes the type annotations explicitly, at places where they need to coerce a subtype to a supertype. The code above will look something like this:

let result = h { a = x.a : A, d = x.d : D } -- An identity transformation on `x` that reassigns types.

For more complicated types, the typechecker will need to be guided step by step into accepting the subtyping relation.

winitzki avatar Jul 02 '24 09:07 winitzki

Hmmm. So I'm still not sure how I feel about this but I'll think out loud and try to summarize what I think is the tradeoff here.

So in the extreme case we could extend the type system to support subtyping and then include the subtyping relationship for records and unions. That's the most robust solution which handles all of the corner cases, but also the most complicated to implement and standardize.

On the other end of the spectrum we could implement this subtyping relationship just for a function applied to its arguments and in no other case. However, that makes it more brittle. For example, some code refactors that user might expect to be be safe might no longer type-check (I think).

And there is a spectrum of solutions in between those two extremes of various places we could introduce this subtyping relationship or not.

Gabriella439 avatar Oct 05 '24 22:10 Gabriella439

On the other end of the spectrum we could implement this subtyping relationship just for a function applied to its arguments and in no other case.

I'm proposing the minimum amount of changes. No changes at all to beta-normalization; some small and localized changes to the type-checker.

However, that makes it more brittle. For example, some code refactors that user might expect to be safe might no longer type-check (I think).

What could be such a scenario with code refactors? Suppose we have a function application f x where f : { a : A, b : B } → R and x : { a : A, b : B, c : C }. Suppose the modified typechecker accepts f x now. How would a user refactor that code to make the result wrong or ill-typed?

The user could replace f by a different function g that accepts a smaller type, such as {a : A}. The application g x will still type-check, but perhaps the user didn't mean to do that and expected an error message.

The user could also replace x by an expression of type { a : A, b : B, d : D}, and f x will still type-check. Some other place in the program might depend on never applying f to a record of type { a : A, b : B, d : D }, but we can't check that any more. Alternatively, the user expects that the JSON output should have contained c : C but there is no c : C any more in the record, instead there is d : D.

However, in such scenarios I would argue that the user should have used correct type annotations. The point of type annotations is to specify the business requirements. If the user replaces f by g, the types need to be given explicitly for the function g, and so the user should be held responsible for providing the correct type annotation for g. If the user expects a record field c : C to appear in JSON, the user should specify that field in the record type. It is bad practice to write a schema that only specifies { a : A, b : B } while actually expecting the record have also some c : C that is not enforced by the schema. The user should have specified the record type as { a : A, b : B, c : C }.

It is not possible that f x leads to a wrong type somewhere. If another function requires the type { a : A, b : B, c : C }, it will still be a type error to give it an argument of type { a : A, b : B}.

One place that looks somewhat suspicious to me is that functions of the form λ(x : { t : T}) → { t = x.t } are no longer identity functions, but instead they are functions that delete all fields from x other than t. What used to be an identity function is now the narrowing function x.{t}, while a true identity function should be written as x with t = x.t or x // { t = x.t }. The beta-normalization rules could even include a rewriting rule to replace such expressions with just x.

So, some functions used to be identity functions but no longer are. But I don't see what code would break due to that.

Another possibility is that a user writes initially f : { a : A, b : B } → R and writes f x where x : { a : A, b : B, c : C }. Then the user revises the function f to f : { a : A, b : B, c : C } → R. The code still type-checks. But now the field c is perhaps being used incorrectly; the user forgot that x already has c : C.

In my view, those are scenarios are expected; they are cases where the subtyping gives the user fewer errors because it allows the user more freedom in refactoring.

winitzki avatar Oct 15 '24 18:10 winitzki