links icon indicating copy to clipboard operation
links copied to clipboard

Use OPAM unionFind package

Open jamescheney opened this issue 3 years ago • 9 comments

Our implementation of union-find was "borrowed" from here: https://yrg.gitlab.io/homepage/static/public/mini/

Francois has recently released an opam package unionFind that provides similar functionality as an opam package. Its interface is similar, though most of the function names differ, but we could accommodate this by keeping the existing interface unionfind.mli but changing the implementation to use the library, as an experiment before making a more global change. (Wrapping the library instead of calling it directly would introduce an extra function call per unionfind API call which might not get inlined away.)

The unionFind library also provides a functor that abstracts over the kind of store used, and provides a few implementations including a transactional one. However, using this also introduces an extra layer of indirection so might not be what we want.

jamescheney avatar Dec 16 '20 13:12 jamescheney

I had a very superficial try at doing this in branch issue945.

One wrinkle I hadn't thought of is that we make Unionfind.point showable, for debugging. I modified this to use UnionFind library calls instead of low-level access to the point type.

In other ways, the UnionFind opam library seems a bit closer to what we want anyway.

Having made this change, Links compiles, but there are many errors, so it seems that there is some nontrivial difference between our modified copy of the old Unionfind and the current UnionFind OPAM library. Running with an empty prelude works OK, and by plugging in various subsets of the prelude I identified one prelude function that doesn't work correctly with the new version:

fun unzip(l) {
  switch (l) {
    case [] -> ([], [])
    case (a,b)::xs -> {
     var (c,d) = unzip(xs);
     (a :: c, b :: d)
    }
  }
}

which fails during typechecking with the error

***: Error: Links_core.TypeUtils.TypeDestructionError("Attempt to erase field from non-record type _::Any") 

Deleting parts suggests that hte problem is arising during typechecking of the (a,b) pattern, the equivalent code using fst and snd to unpack the pair works.

Typechecking of simple expressions fails, e.g. 1+1, [1], [1,2], [(1,2), (3,4)], (+) all give ***: Error: "Assert_failure core/types.ml:2168:19" - this seems to be an assertion failure during printing of the types.

So I'm leaving this as a signpost/warning that (perhaps unsurprisingly) swapping our unionfind implementation with the OPAM one is not necessarily straightforward.

jamescheney avatar Mar 25 '21 12:03 jamescheney

That's peculiar. I wonder whether our copy of the unionfind library contains some bug(s) that we inadvertently rely on.

dhil avatar Mar 26 '21 11:03 dhil

That seems possible, but it's such a simple interface/ADT that I'd be surprised, and the implementation should be completely hidden (aside from hacks that violate abstraction like using == or getting addresses fo things) I

Unfortunately the implementations are just different enough that swapping out certain functions isn't possible. I suppose we could compose both implementations (i.e. define point as a pair of Unionfind.point and UnionFind.elem) and define the operations to work on the pairs, and add checks to see when we get different / non-equivalent union find structures... that's a bit of work though

jamescheney avatar Mar 26 '21 12:03 jamescheney

Random guess: are we crucially relying on repr during type inference? This caused me a lot of headache in inferno - gory details here. I wouldn't be surprised to see something similar here.

jstolarek avatar Mar 26 '21 12:03 jstolarek

After talking with @slindley briefly about this issue, I am convinced that the culprit is that our type inference relies on the order in which things get unified. My guess is that our copy of unionFind and the opam version build the equivalence classes differently, and that is why we observe different behaviour when swapping the libraries.

dhil avatar Mar 26 '21 12:03 dhil

Yep. I haven't looked at the code at all recently, but we definitely rely on the order in which types are unified. It affects the types that are inferred, and in various places (in unify.ml, in particular) is deliberately chosen in order to try to make types, and type errors more informative.

slindley avatar Mar 26 '21 12:03 slindley

In theory, that shouldn't affect correctness, but there may be other subtleties here.

slindley avatar Mar 26 '21 12:03 slindley

OK. I was also surprised that this seemed to affect correctness. I'm inclined to close this or re-categorize as a bug - the high-level behavior shouldn't depend on internal implementation choices being made inside an ADT library...

jamescheney avatar Mar 26 '21 13:03 jamescheney

Even weirder: with the library implementation in branch issue945, the following happens (running with an empty prelude):

links> fun foo(l) {
......  foo(l)
...... };
foo = fun : (_::Any) ~> _::Any
links> [1];
[1] : [Int]
links> foo([1]);
***: Error: "Assert_failure core/types.ml:2168:19" 
links> [1];
***: Error: "Assert_failure core/types.ml:2168:19" 

That is, foo (which is just an infinite loop) typechecks as does [1]. But trying to typecheck (or at least print) the foo([1]) fails. And subsequently trying to typecheck/print [1] also fails. This doesn't happen if foo is defined as, say, the identity function.

Something somewhere is stateful that shouldn't be.

@slindley mentioned the fact that type inference does branching in various places, e.g. to optimistically try to infer a linear type for a function and only if that fails, infer an unrestricted type. That kind of thing could be violating the assumptions of the union-find library - it provides a transactional version which should be used to undo changes in that case.

jamescheney avatar Mar 26 '21 15:03 jamescheney