catala
catala copied to clipboard
Polymorphic operators
Right now, binary and unary operators are specialized depending on their type : +$ for money addition, -@ for date substraction, etc. This is cumbersome and ideally we would want to write + for adding money, dates, integers, decimals, etc.
Why is it like this?
The need for specialized operators stems form Catala's type system, and more precisely from the default calculs typing algorithm. It uses a classical Hindley-Milner inference and the W algorithm. In classical Hindley-Milner inference, operators either operate on a fixed type, or are completely polymorphic. This is the typing procedure for Catala operators right now :
https://github.com/CatalaLang/catala/blob/06803e41704c494389f90bfe5ad0f804c28bb6ee/src/catala/default_calculus/typing.ml#L109-L150
Why can't we make all operators completely polymorphic ? Because it does not make any sense to + two enumerations together, or * two dates.
Hindley-Milner inference with constraints
What we really want is a type system where we say : + operates only on a fixed list of types. This corresponds to Hindley-Milner inference with constraints, a notoriously hard problem. Some libraries exist to perform the typing, like https://gitlab.inria.fr/fpottier/inferno, but overall it is much more complicated from the classical inference I'm doing right now.
Because I don't want to spend too much time on Catala's type system, I prefer we stick to classical inference and hence we're stuck with the default calculus operators as they are.
A compromise
But this does not mean we can't have what we want ! Indeed, we can keep the differentiated default calculus operators but have undifferentiated surface syntax operator. We can write |01/01/2020| - |01/01/2019| and have the compiler figure out that - is actually -@ during desugaring.
This system is nice because it does not complicate the default calculus. Instead, we rely on a conservative desugaring pass that would sit somewhere in Desugared or Scopelang and that would specialize the operators with the information provided by a conservative, high-level additional type checking pass done in Desugared or Scopelang.
If this pass does not manage to determine the type of the operator, we send an error message back to the user saying that desambiguation is needed and the user actually has to write -@.
What are your thoughts on this?
That last suggestion sounds great; I'd add that this desugaring pass would be based on a simple type propagation, so that variables and expression could be handled too, if there is a direct path to get their type.
I would go further and advise using something like x as a date - |01/01/2019| rather than x -@ |01/01/2019| in the surface syntax: from a user point of view, requiring the explicit -@ instead of - only sometimes could actually be more confusing than requiring it always, even if it improves readability in general.
(because it implies that the user understands that both are the same but they are allowed a shortcut, depending on higher order concerns on the structure of the expression)
Having generic type constraints, on the other hand, should be something we can do in a way that is clearly understood (e.g. x$ or x@ or x as a date (but that sounds like a conversion rather than a constraint, so maybe not)?)
Probably the best indicator of whether this is worth it will be the quality of the error messages we are able to print when either we are missing information, or the type is incorrect.
Now that we can run our typer on earlier passes, it may sound reasonable to call it on expressions as soon as desugaring takes place, to resolve the actual operator to insert in the AST. The simplicity of Catala (explicitely typed variables, no polymorphism) should ensure that we can uniquely resolve the operator from the type of its operands using a pre-defined table.
Using the typer this way at that stage would also have application in the resolution of struct/enum names from ambiguous field/variant names.