ocaml #12971 bis, occur_rec: don't expand already visited type nodes

Currently on trunk (and since OCaml 4.01 for some variant of the code below), the following code is accepted

let should_fail (x:'a) = 
  let sub = Seq.(cons empty x) in
  let error = (sub:'a Seq.t) in
  error

and function should_fail is typed as ('a Seq.t as 'a) Seq.t -> 'a Seq.t Seq.t without any -rectypes option in sight.

What happens is that during the unification of the 'a Seq.t constraint with the type 'a Seq.t Seq.t, the occur_rec check is called on this digraph

0300-occur-rec-step-error-l3 14-28

and concludes that the node 25 doesn't occur in the subgraph reachable from node 24. This is obviously false.

The underlying issue is that the digraph already contains cycles in the expansion of the node 24. When the direct path in occur_rec function fails, the occur_rec function falls back to expanding the node 24 (into node 79); and after cycling twice through this expansion, it concludes that it has explored all the reachable subgraph of node 24.

To fix this cycle, this PR modifies the occur_rec function to elide the expansion path when we have already visited the current constructor node (and recursive types are not allowed). In other words, it restricts the expansion alternative path to the earliest occurrence of a given constructor node in the visited path.

Mar 04 '24 13:03 Octachron

I have trouble understanding the graph.

TL;DR: why is there no occurrence of '_25 (which I think is 'a) in the expansion of 'a Seq.t?

Long version:

I would expect that trying to unify 'a Seq.t with 'a Seq.t Seq.t will recursively try to unify 'a with 'a Seq.t, and at this point call occur_check to see if 'a happens in 'a Seq.t. This suggests that the graph we are looking at is this 'a Seq.t -- coherent with your explanation that '_25, the parameter of the Seq.t, is the variable we are occur-checking against.

If I naively expand 'a Seq.t in my head, I get to unit -> 'a Seq.node, which is consistent with what we see in the graph (I'm not sure why there are two successive expansions of Seq.t, and why there are four expansion arrows from the second expansion to the function-type node, but oh well.)

But then, I would expect the argument of the Seq.node node 87 to be 'a itself, if that node really represents the type 'a Seq.node. Why would this argument (the arrow labelled 0) point to the Seq.t node 24 instead? This does not make sense to me. Remarks:

there is an occurrence of 'a Seq.t in the definition of 'a node, but I don't think there is a relation
if we had already started unifying 'a with 'a Seq.t at this point, then 'a Seq.node and 'a Seq.t Seq.node would be unified, but I don't think this is the case at this point

Mar 04 '24 15:03 gasche

_25 is not 'a, if we call it 𝛽, we have 'a = 𝛽 t t (eliding the Seq. prefix), and the graph is the final sub-unification problem from trying to unify 'a t = 𝛽 t t. In other words, the main part of the graph is the 𝛽 = 𝛽 t subgraph; and the part of the graph reachable by expansion contains the equation from the above unification that 𝛽 t = 𝛽 t t .

Mar 04 '24 18:03 Octachron

There are three variables named 'a in your first post: the one in the code snippet with the (x : 'a) annotation (let's call it the "program 'a), the one in the output inferred type (the "output 'a"), and the one in the unification problem 'a Seq.t =?= 'a Seq.t Seq.t (the "problem 'a"). My impression was that '_25 corresponds to both the "output 'a" and the "problem 'a", but not the "program 'a". Does that make sense?

the part of the graph reachable by expansion contains the equation from the above unification that 𝛽 t = 𝛽 t t .

Ah, I missed a part of the unification code:

https://github.com/ocaml/ocaml/blob/84bd073dbdf4d4fc91a5939c6a2bb4adb50ec073/typing/ctype.ml#L2787-L2790

When unifying 'b t = 'b t t, these lines will "link" 'b t with 'b t t before unifying the arguments 'b and 'b t, so this is where the "previous cycle" that you are taking about will get created, and we should already see it in the graph. Is that right? (But isn't the occur-check before this first unification supposed to prevent this? Plausible answer: the occur check does not expand abbreviations, so it would fail to detect a cycle that is hidden by an abbreviation.)

Mar 04 '24 22:03 gasche

There are visibly too many 'a in this PR description, sorry. And the node β=25 is none of those 'a. This node is introduced when inferring the type of Seq.(cons empty x):

let should_fail (x:'a) = 
  let sub : 'b t t = Seq.(cons empty x) in
  ...

(except that adding the type annotation makes the error disappear). In other words, for the program 'a with have 'a = β t t, whereas the problem and output 'a is β t. This is why the error message (in which 'a = β) has three level of nested Seq.t.

The crux of the issue is indeed that we are linking node before having finished to unify all graphs subcomponents.

Moreover, the recursive occurrence check occur_rec ... ty0 ty only check if the node ty0 (aka the full graph) appears in the graph ty. If it not the case, this means that only part of the ty0 graph might appear in ty, and we delegate checking those occurrences to the sub-unification problem.

In other words, when unifying β t t t=β t t , the graph looks like this 0114-unify-error-l3 14-28 (with β=21 in this version, sorry) and occur_rec only checks that the node 40 is not reachable from 46.

Mar 05 '24 10:03 Octachron