dino icon indicating copy to clipboard operation
dino copied to clipboard

Type inference and method/field dispatch

Open sunjay opened this issue 6 years ago • 1 comments

The old type inference algorithm that is currently implemented in the compiler has a severe limitation: the type of the left-hand side of a method/field expression must be known at the time of constraint generation. Even if a later constraint will reveal that type eventually.

The Rust compiler seems to have a similar limitation, but it works around it for this like operators in order to make them more ergonomic. Part of this process is the compiler being able to make assumptions about the types (i.e take a "snapshot") and then backtrack later if those assumptions turn out to be false. The ena crate does seem to support this kind of snapshot functionality.

Another idea is to consider that the process of type checking can be represented as a logic program. In that case, maybe it's possible to lower the program into a set of facts and perform queries on it. Some work would have to be done to ensure that this is sound and complete. If we're making assumptions about multiple types, we need to be careful to avoid exponentially checking every possible combination of types. Tabling and other techniques from logic program solvers may come in handy.

http://smallcultfollowing.com/babysteps/blog/2018/01/31/an-on-demand-slg-solver-for-chalk/

Before a full lowering and solver implementation is started, it's probably prudent to try and hand-lower a complex case of multiple method dispatches and see what happens. We should make sure this method would support language features like literals being allowed to type check for more than one type, multiple method dispatches, fields, etc.

A potentially powerful thing about using logic programming is that it might make the Implementation of things like trait solving and where clauses trivial to express.

A major concern here is error messages. If it isn't possible to get satisfactory error messages from a pure logic programming solver, we may do fine by just borrowing the key ideas and otherwise sticking to a fully custom algorithm. (e.g. we might use tabling to prune possible search paths, but still write everything else ad-hoc.)

sunjay avatar Apr 05 '20 05:04 sunjay

Notes

Four types of constraints, resolved in the following order:

  1. is_type(<def_id>, <ty>) - signifies that the type of the def must unify with the given type (e.g. because of type ascription in parameters, return type, var decl, etc.)
    • this is resolved as we collect constraints using union find
  2. is_lit(<def_id>, <lit-kind>) - signifies that the type of the def must be one of the valid types for the given type of lit
    • each <lit-kind> has a set of types that are valid for that literal
    • multiple is_lit constraints are resolved by taking the intersection of the sets of types valid for each lit that applies to the def
    • if multiple types are remaining, there is a default that can be chosen
    • e.g. 1.2 + 3 can be {real, complex, ..}, but we choose real as the default if no other type can be determined
    • when a default for the type needs to be chosen, there is an ordering in which we check the types: int, real, complex
    • we assign the type to the first of these that is found in the set of types
    • even if a default was chosen for a type variable, we keep its set of valid types around and mark that a default was chosen
    • that way, if a future constraint discovers a different solution we can use that instead
  3. has_method(<def_id>, <method-name>, <arity>) - signifies that the type of the def must have a method with the given name that has a self parameter as its first argument
    • if a method with that name is found, we further do a check to ensure that it has <arity> + 1 arguments (+ 1 for the self param)
      • <arity> should not be used for dispatch
    • if the current type of the def isn't known at this point, the type is considered ambiguous and an error is reported
    • if no such method is found on the type, and the type was defaulted, we search the remaining types that are valid for that type variable and try to find one with the method
      • if more than one of the remaining types has the method, the type is considered ambiguous and an error is reported
  4. is_method_param(<def_id>, <self_ty_def_id>, <method-name>, <index>) - signifies that the type of the def must match the type of the <index>th argument of <method-name> on <self_ty_def_id>
    • if <self_ty_def_id> has no such method, an error will have already been reported in step 3
    • this is resolved using union find (similarly to step 1) -- it's just a delayed is_type constraint
    • if the type of the def was defaulted, this is allowed to override the default as long as the parameter type matches one of the types in the set of allowed types for that variable

If any type variable is unassigned at the end of solving each of these constraints, it is considered ambiguous.

We may need to repeat steps 3 and 4 until no constraints are solved in either of them. Step 4 will reveal types that may not be solvable in step 3 during the initial pass.

Also interesting to consider if we can avoid this or minimize it by ordering the constraints in a certain way. For example, there may be a dependency order established by which expressions are used in which other expressions. A type T that we call a method on may not be inferrable until we see it used as the argument to a method on a type U. As long as U is known, we can determine T.

sunjay avatar Apr 11 '20 16:04 sunjay