Gradualizer
Gradualizer copied to clipboard
Bounded quantification
Picking up this topic from #93. I think we do already agree about these points:
- Type variables are often used where the a type could just have been used directly.
- For type variables appearing only once, bounded quantification doesn't make sense.
- For type variables appearing only in the result, bounded quantification doesn't make sense.
But I think there are other cases where bounded quantification does make sense. I don't agree about this point:
[bounded quantification] is way too strict for variables occurring more than once
How so?
Would you really expect the function -spec foo([Y]) -> Y when Y :: number() to only type check when returning one of the numbers already in the list? No adding them up or multiplying them?
Yes, I do think it makes sense that foo/1
above must return one of the elements in the list. For many of the lists functions, this is what you want. Here are some examples (with only-once used variables unfolded):
-spec lists:any(fun((T) -> boolean()), [T]) -> boolean().
-spec lists:filter(fun((T) -> boolean()), [T]) -> [T].
-spec lists:map(fun((A) -> B), [A]) -> [B].
-spec lists:sort([T]) -> [T].
These often have when T :: term()
in the spec, which I think shouldn't make a difference.
If we just unfold, we get [term()]
or term()
as the return type of many of these functions, which gives rise to a type error where e.g. integer()
is expected. Here are examples of applying these functions:
-spec integer_sort(integer()) -> integer().
integer_sort(Xs) -> lists:sort(Xs).
-spec filter_even([integer()]) -> [integer()].
filter_even(Xs) -> lists:filter(fun (X) -> X rem 2 == 0 end, Xs).
I have a feeling that many of the remaining errors reported when gradualizing the gradualizer are from these, the errors involving term()
. (Some specs are just wrong though, e.g. that of lists:foldl/3
which always returns term()
. These we can override.)
So, I suggest we go for bounded quantification where it makes sense, in a pragmatic way. What do you think?
All of these examples are truly polymorphic, i.e. there are no bounds on the type variables. What would make sense is to drop T :: term()
annotations and treat T
as a polymorphic variable.
True, dropping T :: term()
is another approach. Then, I can't see any real world examples of bounded polymorphic variables. I can only come up with more unlikely ones like this one:
%% Sorts the points by their distance to origo
-spec numeric_pair_sort([T]) -> [T] when T :: {number(), number()}.
Is truly polymorphic variables is easier to handle?
How do you implement polymorphism? Do we simply store type variable bindings within the Env which is passed around?
I don't think bounded polymorphism is much more difficult to implement, but I do think it's not what people want/expect.
To implement polymorphism we need to distinguish rigid and flexible type variables.
- Variables are rigid when you are checking the body of the polymorphic function, in which case you treat them as given types you know nothing about.
- When checking the call to a polymorphic function you generate fresh flexible variables for the variables in the type signature. These have to be assigned concrete values based on the typing constraints.
Handling rigid variables is straightforward, but I suspect we need to do some serious refactoring to deal with flexible variables. We're relying on being able to inspect types in a lot of places, so we either need a mechanism to postpone checks until we know the value of a flexible variable, or turn everything into constraints (which I suppose is one way of postponing things).
A few thoughts on handling flexible variables: I have already tried to pave the way for dealing with flexible variables.
- The
constraints
module collects constraints on both rigid and flexible variables. - It also records all the flexible variables that have been generated.
The difficult part is to solve the constraints to eliminate the flexible variables. This has to be done after we've traversed the whole function.
Solving the constraints would be simple if it wasn't for gradual types. Normally subtyping is transitive and if we have the case that T1 <: A <: T2
(where T1
and T2
are concrete types and A
is a flexible variable) then we can pick A
to be a type between T1
and T2
. (There is sometimes a choice as to what type to pick but it's a solvable problem that's not relevant to the current discussion.)
In the presence of the any()
then subtyping is not transitive anymore (because we don't want integer() <: any()
and any() <: boolean()
to imply integer() <: boolean()
). So we cannot use transitivity anymore to solve constraints. That's a real bummer. What to do?
I've been thinking that we should just go ahead and use transitivity anyway despite it not being correct. What would be the consequence of that? It would mean that we sometimes generate an error message when the code is actually well-typed. That is somewhat suboptimal but at least it is a way forward and I don't have any other suggestion how to do it. If you guys have a better idea I'm all ears.
If we make sure we don't have any constraints with any(), we can use transitivity, right?
Either we never store any constraints with any() or (equivalently I think) we convert all any() to term() within the upper bounds and to none() within the lower bounds. What do you think?
Hello old thread. OTP functions where bounded type variables could make sense are lists:keydelete/3
and friends, operating on lists of tuples. Don't we want a spec like this for it?
-spec keydelete(Key, N, [A]) -> [A] when A :: tuple().
It came up in #490. @UlfNorell @josefs WDYT?
Don't we want a spec like this for it?
-spec keydelete(Key, N, [A]) -> [A] when A :: tuple().
Which would specifically make sense as #r{} :: tuple()
holds for any record #r{}
.
Another mechanism which could "preserve" the type of list elements from input to output would be an intersection A & tuple()
, also found in literature. Given there's no Erlang spec syntax for intersections, I think this interpretation of A :: tuple()
could also be practical 🤔
A & tuple()
:+1: :+1: :+1: