ceylon icon indicating copy to clipboard operation
ceylon copied to clipboard

Haskell-Style Type Families

Open CeylonMigrationBot opened this issue 11 years ago • 17 comments

[@pthariensflame] See #3987 for the discussion that led to this issue.

[Migrated from ceylon/ceylon-spec#883]

CeylonMigrationBot avatar Dec 30 '13 01:12 CeylonMigrationBot

[@gavinking] So the question is: what would it look like in Ceylon syntax?

CeylonMigrationBot avatar Dec 30 '13 01:12 CeylonMigrationBot

[@pthariensflame] A simple way would be to use the existing alias and switch syntaxes, and just combine them:

alias Plus<M,N> given M satisfies Nat given N satisfies Nat => switch (M) {
    case (Nil) => N;
    case (Succ<PredM>) => Succ<Plus<PredM,N>>;
}

The alias family would then only be defined when M is a subtype of either Nil or Succ<M>; just plain Nat would be rejected at compile-time. class and interface families could work the same way, although refinement might take some thinking.

An else clause could be very easily accommodated by this syntax, and we could permit nested switches to allow multiple types to be switched over. The question then becomes, is there anything besides switches that we could reasonably allow in an alias family? Perhaps (sufficiently restricted) ifs?

Note that allowing unrestricted recursion in alias families would produce an undecidable type system, but forbidding any recursion would disallow the very useful alias families like Plus that do guaranteed-finite structural recursion over their arguments. Even GHC can't tell that Plus is safe, which is why the UndecidableInstances extension exists; with UndecidableInstances, GHC will accept pretty much any well-kinded type-level program, and just cut off recursive type family expansion when the type-level stack exceeds a certain depth (modifiable by a compiler flag). We could do something like that, or we could try to do termination checking via structural recursion á la Agda or Idris. The latter option could potentially get very complex, though, especially in the presence of subtyping.

CeylonMigrationBot avatar Dec 30 '13 02:12 CeylonMigrationBot

[@pthariensflame] The prior-mentioned syntax would allow so-called closed families; the open ones would require something a bit closer to Haskell's current method:

/** A class that efficiently represents arrays of certain types.

The list of types accepted may be expanded at any time by adding a `class instance` of `EfficientArray` for the new type. */
class family EfficientArray<A> {
    ...
}

/** An `EfficientArray` of `EfficientArray`s of `A`s, represented as a single flattened `EfficientArray` of `A`s, along with a "partitioning" array. */
class instance<A> of EfficientArray<EfficientArray<A>>(...) {
    ...
}

/** An `EfficientArray` of `Tuple`s, represented as a `Tuple` of `EfficientArray`s. */
class instance<Top,First,Rest> of EfficientArray<Tuple<Top,First,Rest>>(...) given First satisfies Top given Rest satisfies Top[] {
    ...
}

This is, by the way, the core mechanism by which Data Parallel Haskell operates.

CeylonMigrationBot avatar Dec 30 '13 02:12 CeylonMigrationBot

[@gavinking] Frankly I would love something like this and have even proposed it in the past. @RossTate has pushed back against it, I guess because it's very hard to distinguish the decidable type families from the undecidable ones. From your description above, it sounds to me like you're hinting that practical attempts to reject undecidable type families results in rejection of lots of useful and decidable ones.

CeylonMigrationBot avatar Jan 05 '14 20:01 CeylonMigrationBot

[@pthariensflame] @gavinking That sounds like a fair summary to me. If necessary, you could build a few such "undecidably decidable" type families, like Plus, directly into the language, possibly in conjunction with some easier syntax for type-level naturals and strings, and then (for now, at least) just disallow recursion in user-defined type families. You'd have to decide which ones were worth making "primitive", though.

CeylonMigrationBot avatar Jan 05 '14 20:01 CeylonMigrationBot

[@gavinking] Yeah, I mean, as a general rule you want to avoid creating special cases at the language level for things that you could generalize over. At least, you shouldn't create special cases unless you have a really good justification for doing so. But it seems to me that the limitations on decidability here are a sufficiently good reason to go ahead and contemplate the special-case-based approach..

You'd have to decide which ones were worth making "primitive", though.

Right, so that's the real question. What type-level functions are useful? We know that Plus and Pred are useful. And I guess probably also Minus? Beyond that I just don't have enough knowledge of the kind of stuff that dependent type systems deal with to be able to guess what other functions we would be interested in.

CeylonMigrationBot avatar Jan 05 '14 20:01 CeylonMigrationBot

[@pthariensflame] A disclaimer: the following recommendations can (mostly) be cherry-picked as you see fit. You don't have to include all of these "type-level types", or even most (with the exception that if you include anything, you have to include booleans, for reasons pointed out below).


For natural numbers, the standard math operators +, -, *, /, **, and % should probably be included; with these, Pred and Succ can be defined (in the standard library, if you like) as simply:

alias Succ[Nat n] => n+1;

alias Pred[Nat n] => n-1;

You could even use ++ and -- instead of the longer names Pred and Succ, and I don't think it would cause any confusion (it might even be preferable to some). Along with these, I would recommend type-level (whole) numeric literals, although I gather that you're in favor of that already.

It might be worth noting that GHC's (admittedly very recent) support for sugar over type-level naturals only includes + and *, because the others might fail for some inputs, but, since it's all at compile-time anyway, I don't think Ceylon will have any problems. Alternatively, you could simply make them type-level integers instead of type-level naturals, and then the potential problem goes away almost entirely (just 0/0, _%0, and 0**0 are left as partial states), although then you'd have to deal with negative literals as well.

For type-level booleans, all you really need are the similarly simple operations (&&, ||, ^, and the values true and false); the only thing to note here is that (perfectly logically) type-level comparison operators should return type-level booleans. If you want, you can make Comparison (along with smaller, equal, and larger) available at the type-level as well, and it would be a very simple addition to the mix indeed, but it would also not really gain much that I can see. Either way, the extreme usefulness of the standard comparison operators is, I think, reason enough to include type-level booleans, since I know you definitely want type-level naturals.

The motivation for type-level characters derives from the motivation for type-level strings, which I will talk about below. The only necessary operations on type-level characters are successor, predecessor, and the standard comparison operators, but type-level character literals would still be very nice to have.

The motivation for type-level strings is many-fold: they could be used to give names to the parameters of a Callable, they could help describe the members of a "structural" type (a.k.a. a "named tuple"), they could greatly enhance both the type-safety and the usability of the meta-model by preserving information about identifiers at compile time, etc.; all without adding a much less well-understood "name" concept to the type system. Operations on type-level strings should include Length, indexing, and the standard comparison operators, along with the definitely necessary type-level string literals.

All of these could be implemented very efficiently in the compiler, and a reified type-level value simply becomes a regular value of the appropriate type at runtime.


Beyond that, there's not much data on what's useful. I suppose what you could do is just add the above (even just some of the above) and then consider any requests for more from those actually using the feature.

CeylonMigrationBot avatar Jan 05 '14 21:01 CeylonMigrationBot

[@gavinking]

It might be worth noting that GHC's (admittedly very recent) support for sugar over type-level naturals only includes + and *, because the others might fail for some inputs, but, since it's all at compile-time anyway, I don't think Ceylon will have any problems.

Right, I might be missing some subtlety, but I don't see how it would be a big problem.

the extreme usefulness of the standard comparison operators is, I think, reason enough to include type-level booleans, since I know you definitely want type-level naturals.

OK, cool.

You didn't state it explicitly but I assume you're also proposing type-level <, >, etc. These are for type constraints, I imagine?

The motivation for type-level strings is many-fold: they could be used to give names to the parameters of a Callable, they could help describe the members of a "structural" type (a.k.a. a "named tuple"), they could greatly enhance both the type-safety and the usability of the meta-model by preserving information about identifiers at compile time

Hah! That is interesting. First time I've seen this idea. Indeed, this looks like it could potentially open a way forward on a couple of problems we had considered intractable. OTOH, it's not clear to me that these should really be type-level strings, rather than type-level "symbols" aka identifiers.

CeylonMigrationBot avatar Jan 05 '14 21:01 CeylonMigrationBot

[@pthariensflame] You could make them symbols, but the change in meaning associated with the change in name would strip all the operators off of them except for equality and inequality. That could be a good or a bad thing; I'm personally not sure where I stand on it, although making them symbols deprives type-level characters of any particular motivation (as far as I know).

As for comparisons versus constraints, one semi-general solution should actually be pretty easy: just make any given clause accept a type-level expression, possibly involving any type variables in scope, that returns a type-level boolean; once the clause becomes closed and decided, simply interpret true as "constraint fulfilled" and false as "constraint impossible".

EDIT: Actually, the "given clauses accept type-level booleans" scheme that I outlined above could also help with "conditional inheritance", if you wanted to include some kind of type-level if expression.

CeylonMigrationBot avatar Jan 05 '14 22:01 CeylonMigrationBot

[@gavinking]

As for comparisons versus constraints, one semi-general solution should actually be pretty easy: just make any given clause accept a type-level expression, possibly involving any type variables in scope, that returns a type-level boolean; once the clause becomes closed and decided, simply interpret true as "constraint fulfilled" and false as "constraint impossible".

Yes, this is what I have always had in mind. For example, I want to be able to write shit like:

[Float,Float] firstTwo(Vector<N> vector) given N satisfies Nat given N>1 => .... ;

CeylonMigrationBot avatar Jan 06 '14 11:01 CeylonMigrationBot

[@pthariensflame] Ah...

For that, you would need one of:

  • Explicit witnessing, á la Agda, Coq, or Epigram.
firstTwo : ∀ {a} {A : Set a} {n} → Vec A n → n > 1 → A × A
firstTwo [] ()
firstTwo (x ∷ []) (s≤s ())
firstTwo (x ∷ x′ ∷ xs) (s≤s (s≤s _)) = x ,′ x′
  • Generalized constraint solving, á la recent versions of GHC.
firstTwo :: forall (a :: *) (n :: Nat). (n > 1) => Vec a n -> (a, a)
firstTwo (x ::: x' ::: _) = (x, x')

The former would probably be extremely ugly in Ceylon, unless you already had a particular specialized syntax for it in mind, and the latter would further special-case the use of "primitive" expressions in constraints (as opposed to user-defined ones). This is the fundamental reason for GHC only including + and * as operators on Nat :: BOX (aside from comparisons).

The basic problem is that firstTwo requires additional information beyond merely that the type-level expression N>1 evaluates to true at compile-time: it requires (at minimum) that N have the structure Succ<Succ<X>> for some valid natural number X. Only if the compiler can automatically derive that from the constraint N>1 will your signature work as it stands. I suppose, in Ceylon specifically, you could also use some asserts and type narrowings that you know (but the compiler doesn't) will work at compile time, but that seems pretty ugly to me.

EDIT: Added code samples for each option.

CeylonMigrationBot avatar Jan 06 '14 20:01 CeylonMigrationBot

[@gavinking] @pthariensflame but, OTOH, if we do this by just hacking the special case of Nat into the language definition and typechecker, it looks tractable to me. Unless I'm missing something.

CeylonMigrationBot avatar Jan 07 '14 10:01 CeylonMigrationBot

[@RossTate] I'm gonna put aside specializations to natural numbers for now since it's a separate topic. The basic type family feature you seem to be proposing is to allow people to write partial functions from types to types based on pattern matching (and structural recursion) on the name of the input type. In its simplest form, this is decidable, but there will be limitations when the input type has a type variable. For example, you wouldn't be able to determine that Plus<A,Zero> is the same as A whenever A is a valid input to Plus. So, when you're talking about whether these ideas are practical, you also have to check that the programs you want to work will still type check with a limited equational/subtyping theory.

CeylonMigrationBot avatar Jan 07 '14 16:01 CeylonMigrationBot

[@pthariensflame] @RossTate Right, but that's mainly because Plus is defined recursively; if it's defined immediately, or if it's simply hacked in to the compiler like @gavinking and I were talking about, then that problem disappears. I think we should just have non-recursive type families only, and then put some number of "primitive" type families into the language definition.

CeylonMigrationBot avatar Jan 07 '14 19:01 CeylonMigrationBot

[@RossTate] The problem is not due to recursion. You need to show that the equality between non-recursive pattern-matching functions (from and onto an infinite set) is decidable. I have my suspicions that it isn't.

CeylonMigrationBot avatar Jan 10 '14 00:01 CeylonMigrationBot

[@pthariensflame] The thing is, it's not just an infinite set, it's an infinite fully-bounded distributive lattice with decidable partial order, and I think the presence of that additional structure, plus the fact that we're guaranteed to only be dealing with pattern matches with a finite number of finitely-large clauses, augers well for decidability of type-family equality.

CeylonMigrationBot avatar Jan 10 '14 01:01 CeylonMigrationBot

Sorry to up this 2014-ish conversation,

@pthariensflame Just some correction. It seems Haskell implementation of Plus doesn’t need UndecidableInstances. I tried in GHCi with this:

> :set -XTypeFamilies +m
> data Z
> data S a
> type family Plus a b where
>     Plus Z a = a
>     Plus (S a) b = S (Plus a b)
> type Two = S (S Z)
> :t undefined :: Plus Two Two
==> undefined :: Plus Two Two :: S (S (S (S Z)))

FTR the following irregular variant also works:

type family Plus' a b where
    Plus' Z a = a
    Plus' a Z = a
    Plus' (S a) (S b) = S (S (Plus'' a b))

And the set of restrictions making this available, but barring undecidability is listed here in GHC docs.

(Of course, equality of arbitrary type function applications is another thing, about which I know close to nothing. Surely something on it exists in GHC docs, but it seem to be quite Haskell-type-system-specific (and maybe not applicable here) this time.)

arseniiv avatar Apr 16 '16 08:04 arseniiv