unification-fd icon indicating copy to clipboard operation
unification-fd copied to clipboard

Hindley Milner example?

Open Gabriella439 opened this issue 3 years ago • 7 comments

Is it possible to implement the Hindley Milner type inference algorithm using this package's API? The reason I ask is that I made an initial attempt to do so, but ran into difficult making polymorphic types Unifiable.

To illustrate this, usually the way the types are encoded in the HM algorithm is something like:

data Polytype = Polytype [TypeVariable] Monotype

data Monotype = Variable TypeVariable | FunctionType Monotype Monotype

Now, I can make Monotype implement Unifiable by doing this:

data MonotypeF x = Variable TypeVariable | FunctionType x x
    deriving (Foldable, Functor, Generic1, Traversable, Unifiable)

type Monotype = Fix MonotypeF

… but I can't make Polytype implement Unifiable (at least, not in a useful way, as far as I can tell). Is this a limitation of the unification-fd API or am I missing something?

Gabriella439 avatar Feb 28 '21 07:02 Gabriella439

Hi @Gabriel439 , I have recently been looking into implementing Hindley-Milner via unification-fd as well. I hope to be able to work up a full example and publish it soon, since I can't seem to find any other examples floating around. Briefly, however, in the meantime, I think there is no need to make Polytype an instance of Unifiable, because Hindley-Milner algorithms only ever unify monotypes.

byorgey avatar Sep 07 '21 16:09 byorgey

Ah, sorry I missed your question so long ago. Yes, you can do Hindley–Milner and I can add a worked example a bit later today. (Just have to figure out where I've stashed it :)

As Brent says, you don't want to make Polytype an instance of Unifiable, because then that'd mean that the forall (that the Polytype data constructor represents) could show up nested arbitrarily deep in type terms, whereas for Hindley–Milner the forall only ever occurs at the top. Instead, you do something more like:

data MonotypeF r = FunctionType r r
type Monotype = UTerm MonotypeF TypeVariable
data Polytype = Polytype [TypeVariable] Monotype
instance Unifiable MonotypeF where...
instance Variable TypeVariable where...

Of course, depending on your final goals, you may not want to bother defining your own TypeVariable; you can just use IntVar or STVar instead. The only real reasons to write your own variables are (a) you have a clever new BindingMonad to improve that part of unification somehow, or (b) you want to do something unification itself doesn't care about, like keeping track of variable names the user might've typed in.

wrengr avatar Sep 08 '21 17:09 wrengr

My worked and (perhaps excessively) commented example can now be found here: https://byorgey.wordpress.com/2021/09/08/implementing-hindley-milner-with-the-unification-fd-library/ (source code at https://github.com/byorgey/hm-unification-fd). @wrengr , if you have time to look through it, I would be very glad to hear any suggestions for improvement!

byorgey avatar Sep 08 '21 17:09 byorgey

Ah, yes, I had it in the repo already :) test/experiments/putting/PuttingDM.hs. The code there is rather extensive yet undocumented; however, it follows the paper very closely, so it should all make sense if you read it alongside the paper. This weekend I'll move it to a more discoverable location in the repo, and see if I can't clean it up a bit.

wrengr avatar Sep 08 '21 21:09 wrengr

Ah, thanks! On a quick skim it looks kind of similar to what I ended up with (I have probably been subconsciously influenced by reading that paper long ago, and other things like it). The one part I'm not sure I understand is why the code for finding free type variables is so complicated. That makes me worried that I've done it incorrectly in my implementation.

byorgey avatar Sep 08 '21 22:09 byorgey

It doesn't seem so complicated to me? Though if I had to guess, I think you might be referring to the need to handle the ForAll constructor. Proper Hindley–Damas–Milner inference wouldn't have to deal with that (except at the top-level which is simpler), but it's there because the "putting" paper was setting things up for extending to higher-rank polymorphism. I really was trying to follow the paper closely when writing this code, rather than making changes along the way. If I were writing the code from scratch, I'm sure there's a lot that could be simplified; hence my comment about cleaning the code up a bit before moving it to ./examples

Fwiw, the algorithm is as follows.

  1. First zonkTypeAll will apply all the bindings for unification/meta variables, so that freeTyVars doesn't have to worry about them at all. (But see the footnote below.)
  2. Then freeTyVars does some boilerplate to run the recursive function go on each of the Type arguments. The state monad keeps track of two sets of tyvars: the first tracks all the bound tyvars, and the second accumulates all the free tyvars. Thus, freeTyVars will start go off with a pair of empty sets, and when that's finished it'll return the accumulated set of free tyvars (converted back to a list, just cuz that's what the "putting" paper does).
  3. The go function recurses through the type and handles each constructor in turn.
  • For simple structural constructors like Fun and TyCon we just recurse down each subterm in the usual manner.
  • For TyVar we get the set of bound tyvars and check whether this tyvar is in there. If this tyvar is bound, then return trivially; if this tyvar is free, then add it to the set of free tyvars.
  • Finally, for type syntax that binds tyvars (e.g., ForAll) we need to:
    1. grab the current set of bound tyvars so we can restore them when we're done,
    2. update the monad's set to include the tyvars bound by this constructor,
    3. do the recursive call,
    4. finally, remove the tyvars we added to the monad's state in the second step, by replacing that state with the original set of bound tyvars we got in the first step.

Footnote: Technically we don't need to do the zonkTypeAll step, and indeed doing it introduces an asymptotic performance penalty. What this step does is save the freeTyVars/go function from needing to worry about looking up unification variables along the way (since zonkTypeAll ensures that go will never actually encounter UVar). The performance penalty comes from this: whenever a unification variable is bound to some structure, U.applyBindingsAll will substitute that structure in place of the variable in the generated output (as expected); however, even though U.applyBindingsAll makes sure to preserve sharing when the same unification variable is used in several places, the go function has no way to observe that sharing, thus go will be forced to traverse the same structure over and over. In any case, even though I optimized mapM zonkType into zonkTypeAll, I left the latter in simply because that's how the "putting" paper does things. Personally I'd leave the unification variables for go to look up itself, that way go can observe the sharing and so avoid doing redundant work. However, this would further complicate the code since now the state monad used by go will have to also keep track of which unification variables it has already traversed.

wrengr avatar Sep 09 '21 00:09 wrengr

Ah, that makes sense, thanks for the detailed explanation!

byorgey avatar Sep 09 '21 01:09 byorgey