unification-fd
unification-fd copied to clipboard
Hindley Milner example?
Is it possible to implement the Hindley Milner type inference algorithm using this package's API? The reason I ask is that I made an initial attempt to do so, but ran into difficult making polymorphic types Unifiable
.
To illustrate this, usually the way the types are encoded in the HM algorithm is something like:
data Polytype = Polytype [TypeVariable] Monotype
data Monotype = Variable TypeVariable | FunctionType Monotype Monotype
Now, I can make Monotype
implement Unifiable
by doing this:
data MonotypeF x = Variable TypeVariable | FunctionType x x
deriving (Foldable, Functor, Generic1, Traversable, Unifiable)
type Monotype = Fix MonotypeF
… but I can't make Polytype
implement Unifiable
(at least, not in a useful way, as far as I can tell). Is this a limitation of the unification-fd
API or am I missing something?
Hi @Gabriel439 , I have recently been looking into implementing Hindley-Milner via unification-fd
as well. I hope to be able to work up a full example and publish it soon, since I can't seem to find any other examples floating around. Briefly, however, in the meantime, I think there is no need to make Polytype
an instance of Unifiable
, because Hindley-Milner algorithms only ever unify monotypes.
Ah, sorry I missed your question so long ago. Yes, you can do Hindley–Milner and I can add a worked example a bit later today. (Just have to figure out where I've stashed it :)
As Brent says, you don't want to make Polytype
an instance of Unifiable
, because then that'd mean that the forall
(that the Polytype
data constructor represents) could show up nested arbitrarily deep in type terms, whereas for Hindley–Milner the forall
only ever occurs at the top. Instead, you do something more like:
data MonotypeF r = FunctionType r r
type Monotype = UTerm MonotypeF TypeVariable
data Polytype = Polytype [TypeVariable] Monotype
instance Unifiable MonotypeF where...
instance Variable TypeVariable where...
Of course, depending on your final goals, you may not want to bother defining your own TypeVariable
; you can just use IntVar
or STVar
instead. The only real reasons to write your own variables are (a) you have a clever new BindingMonad
to improve that part of unification somehow, or (b) you want to do something unification itself doesn't care about, like keeping track of variable names the user might've typed in.
My worked and (perhaps excessively) commented example can now be found here: https://byorgey.wordpress.com/2021/09/08/implementing-hindley-milner-with-the-unification-fd-library/ (source code at https://github.com/byorgey/hm-unification-fd). @wrengr , if you have time to look through it, I would be very glad to hear any suggestions for improvement!
Ah, yes, I had it in the repo already :) test/experiments/putting/PuttingDM.hs. The code there is rather extensive yet undocumented; however, it follows the paper very closely, so it should all make sense if you read it alongside the paper. This weekend I'll move it to a more discoverable location in the repo, and see if I can't clean it up a bit.
Ah, thanks! On a quick skim it looks kind of similar to what I ended up with (I have probably been subconsciously influenced by reading that paper long ago, and other things like it). The one part I'm not sure I understand is why the code for finding free type variables is so complicated. That makes me worried that I've done it incorrectly in my implementation.
It doesn't seem so complicated to me? Though if I had to guess, I think you might be referring to the need to handle the ForAll
constructor. Proper Hindley–Damas–Milner inference wouldn't have to deal with that (except at the top-level which is simpler), but it's there because the "putting" paper was setting things up for extending to higher-rank polymorphism. I really was trying to follow the paper closely when writing this code, rather than making changes along the way. If I were writing the code from scratch, I'm sure there's a lot that could be simplified; hence my comment about cleaning the code up a bit before moving it to ./examples
Fwiw, the algorithm is as follows.
- First
zonkTypeAll
will apply all the bindings for unification/meta variables, so thatfreeTyVars
doesn't have to worry about them at all. (But see the footnote below.) - Then
freeTyVars
does some boilerplate to run the recursive functiongo
on each of theType
arguments. The state monad keeps track of two sets of tyvars: the first tracks all the bound tyvars, and the second accumulates all the free tyvars. Thus,freeTyVars
will startgo
off with a pair of empty sets, and when that's finished it'll return the accumulated set of free tyvars (converted back to a list, just cuz that's what the "putting" paper does). - The
go
function recurses through the type and handles each constructor in turn.
- For simple structural constructors like
Fun
andTyCon
we just recurse down each subterm in the usual manner. - For
TyVar
we get the set of bound tyvars and check whether this tyvar is in there. If this tyvar is bound, then return trivially; if this tyvar is free, then add it to the set of free tyvars. - Finally, for type syntax that binds tyvars (e.g.,
ForAll
) we need to:- grab the current set of bound tyvars so we can restore them when we're done,
- update the monad's set to include the tyvars bound by this constructor,
- do the recursive call,
- finally, remove the tyvars we added to the monad's state in the second step, by replacing that state with the original set of bound tyvars we got in the first step.
Footnote: Technically we don't need to do the zonkTypeAll
step, and indeed doing it introduces an asymptotic performance penalty. What this step does is save the freeTyVars
/go
function from needing to worry about looking up unification variables along the way (since zonkTypeAll
ensures that go
will never actually encounter UVar
). The performance penalty comes from this: whenever a unification variable is bound to some structure, U.applyBindingsAll
will substitute that structure in place of the variable in the generated output (as expected); however, even though U.applyBindingsAll
makes sure to preserve sharing when the same unification variable is used in several places, the go
function has no way to observe that sharing, thus go
will be forced to traverse the same structure over and over. In any case, even though I optimized mapM zonkType
into zonkTypeAll
, I left the latter in simply because that's how the "putting" paper does things. Personally I'd leave the unification variables for go
to look up itself, that way go
can observe the sharing and so avoid doing redundant work. However, this would further complicate the code since now the state monad used by go
will have to also keep track of which unification variables it has already traversed.
Ah, that makes sense, thanks for the detailed explanation!