Gradualizer Checking functions with intersecting specs is not complete

During work on #404 it turned out that checking functions with intersecting spec clauses is not complete:

$ cat clauses_intersect.erl
-module(clauses_intersect).

-export([foo/1]).

-spec foo(bar) -> bar; (baz) -> baz.
foo(X = bar) -> X;
foo(X = baz) -> X.
$ ./bin/gradualizer --verbose clauses_intersect.erl
Checking module clauses_intersect
Spawning async task...
Checking function foo/1
Pushing {clauses_controls,true}
6:1: Checking clause :: bar
6:17: Checking that X :: bar
Task returned from function foo/1 with [{type_error,unreachable_clause,{7,1}}]
clauses_intersect.erl: The clause on line 7 at column 1 cannot be reached

Actually, now that I look at it I think check_clauses_intersect is buggy (or just incomplete? I think there's a comment somewhere in the code which suggests function intersections are not fully supported yet). Take an example like this:

-spec foo(bar) -> bar;
      foo(list()) -> baz | qux.
foo(bar) -> bar;
foo([]) -> baz;
foo([a]) -> qux;
foo([_, _ | T]) -> foo(T).

The question is: after how many fun clauses should we switch from checking against foo(bar) -> bar to foo(list()) -> baz | qux? Well, once our fun clauses exhaust bar. This is not done currently, hence the error returned when checking your example. We should probably do something more like this:

try
    {Env1, Cs1} = check_clauses_fun(Env, Ty, Clauses),
catch
    throw:{type_error,unreachable_clause,...} ->
        %% Now try to type check the fun clauses against the next spec clause.
end

All in all, though, I think this is outside the scope of this PR.

Originally posted by @erszcz in https://github.com/josefs/Gradualizer/pull/404#discussion_r851094744

Apr 27 '22 10:04 erszcz

The question is: after how many fun clauses should we switch from checking against foo(bar) -> bar to foo(list()) -> baz | qux? Well, once our fun clauses exhaust bar.

I don't agree about switching. The function clauses can be in any order, regardless of the order of spec-clauses.

This is how I imagine it to work: For each function clause, we can choose which spec-clause to check against by matching the types of the parameters (patterns and guards) against the the spec-clauses. When we do exhaustiveness-checking, the function clauses together should exhaust all of the spec-clauses.

For foo([_, _ | T]) -> foo(T) in your example, we should find out that it can't match spec foo(bar) -> baz so we try the second spec-clause foo(list()) -> baz | qux. We have [_,_|T] :: list(), so T :: list() and foo(T) :: baz | qux. We may do exhaustiveness checking for the first spec-clause foo(bar) -> baz but not for the second, i.e. we could disable exhaustiveness checking per spec-clause, if we're smart enough.

Apr 27 '22 13:04 zuiderkwast

I don't agree about switching.

I get that. My question was to be read more as "how should we match the function clause with the spec clause" and not to suggest actually going from top to bottom is the right/only way to do it. And your elaboration on how it could work is spot on :+1:

Apr 29 '22 14:04 erszcz

Great that we agree! :relieved: Perhaps the assumption that specs can't overlap makes it easier for us. This is from https://www.erlang.org/doc/reference_manual/typespec.html#specifications-for-functions:

A function specification can be overloaded. That is, it can have several types, separated by a semicolon (;):
-spec foo(T1, T2) -> T3
       ; (T4, T5) -> T6.
A current restriction, which currently results in a warning (not an error) by the compiler, is that the domains of the argument types cannot overlap. For example, the following specification results in a warning:
-spec foo(pos_integer()) -> pos_integer()
       ; (integer()) -> integer().

I don't know if the compiler actually warns about this. I guess it doesn't.

Maybe we should be careful about any(), since in Gradualizer it doesn't mean top() but any type between none() and top(). It's just not spelled out statically which type it is. This means that we can't determine if it overlaps with another type. We can perhaps assume that it doesn't, or we can forbid things like the following:

%% Let's say we haven't decided yet if this is an integer or a float
-type my_number() :: any(). % <--- WIP

-spec bar(my_number()) -> ping;
         (atom()) -> pong.
bar(N) when N > 0, N < 10 -> ping;
bar(X) -> pong.

If the occurrence any() in the example above is a number in runtime, these clauses don't overlap. (In this example we could infer it to 1..9 | float() but I don't know if that's always possible.)

Maybe it's not going to be a problem in practice. It seems unlikely that anyone will do this. We could simply forbid it if it makes life easier.

Apr 29 '22 21:04 zuiderkwast

Maybe it's not going to be a problem in practice.

Hmm, I imagine one might want to write a function like the following:

-spec classify(integer()) -> integer;
              (_) -> non_integer.
classify(I) when is_integer(I) -> integer;
classify(_) -> non_integer().

I think it's worth trying to make it possible, but let's see how it goes.

Apr 29 '22 21:04 erszcz

Right, it looks much more reasonable to have any() in a catch-all clause. If we'd have negation and intersection, we could interpret _ in the second clause as any() \ integer()... [Edit] Or we can just allow any().

Apr 29 '22 22:04 zuiderkwast

This is fixed by #461, closing it now.

Dec 16 '22 18:12 erszcz

Gradualizer Gradualizer copied to clipboard

Checking functions with intersecting specs is not complete

Gradualizer
Gradualizer copied to clipboard