catala
catala copied to clipboard
Discussion for AST simplification
The summer has seen a thourough code reorganisation, with the unification of shared code and types for different passes under the shared_ast sub-library.
However, the AST structures themselves have been left unchanged, if defined in a different way (one thing at a time!) . For example, while the type for tuples and structures is now separated, but the underlying expressions still fall into the same EStruct case with an option for the structure name.
This is to open a thread for discussion on the changes that we want or not before actually implementing the changes. I'll write different aspects in different comments.
Types in expressions
Some expressions contain types. This is mandatory, since type constraints may come from the syntax and need to be passed through. This concerns:
EAbs, with the type of the function argumentsETupleAccessandEInjwith the expanded type of the underlying structure/enum as a list
For the first case, this is required from scope definitions. However, lots of functions are generated by the compiler (e.g. translation of let-ins) and use TAny, which feels a bit inconsistent.
For the second cases, this encodes a name resolution for the struct/enum that is done early, and has consequences for type infeence. In the case of enums, EnumName.t * typ list is clearly redundant (since we now assume access to the context — we keep the same assumptions on the consistency of the enum constructor name ordering) and can be simplified to just the enum name. The same could be true for named enums, but not for tuples which I'll discuss in another comment.
Proposition #1
Getting back to the case of EAbs, would it be cleaner to remove the type from the construct but add a new ECoerce of expr * ty case to pass the type information from the scopes ?
Structs vs tuples
We now have:
- for ScopeLang:
| EStruct of StructName.t * expr StructFieldMap.t | EStructAcess of expr * StructFieldName.t * StructName.t - for Dcalc
| ETuple of expr list * StructName.t option | ETupleAccess of expr * int * StructName.t option * typ list
The change from named fields to indices in a list is a meaningful compilation step, and similar to what happens on enums (which change from EEnumInj / EMatchS to EInj / EMatch); we probably want to keep it¹. However, the options in the Dcalc case are a bit cumbersome — esp. when we consider that the typ list should not be useful when the struct name is present.
The simple envisionned solution is to duplicate the fields, like was done for the types:
| ETuple of expr list
| ETupleAccess of expr * int * typ list
| EStruct of expr list * StructName.t
| EStructAccess of expr * int * StructName.t
This doesn't feel completely satisfactory though: it adds redundance, while computationally the struct and tuple cases remain equivalent. Moreover — at the moment — tuples are only ever used for closure conversion, where the types are actually filled in with TAny placeholders !
Removing the struct name options and using tuples everywhere isn't a proper solution either because we do need those for meaningful printing and errors.
With all this in mind, I am unsure that the simple solution would be a clear improvement. But is there a clearly better solution than the current statu quo ? We could just remove the typ list but that would prevent later re-typing, so it should be filled with correct information instead. Or could we rely exclusively on structs, but have closure conversion add many definitions of custom structs to the environment ?
If we don't intend to expand the use of tuples, it'd also be possible to add them, but only to lcalc ast and not to dcalc…
What do you think ?
¹ Although some backends (e.g. OCaml), to generate more readable code, roll back from indexed tuples to records using the extra information about original structs…
Use inline records
I think we have consensus on this one, but it's mostly cosmetic and not very high priority (a bit tedious to write too). Replace e.g.
| EDefault of expr list * expr * expr
by
| EDefault of { exceptions: expr list; just: expr; cons: expr }
This avoids confusion between field parameters of the same type, softly enforces consistent naming of these across pattern-matchings throughout, and allows more concise matching of a subset of parameters.
Structs vs tuples
Following proposition #1, encoding structs as ECoerce (ETuple (_), TStruct struct_name) and ETupleAccess (ECoerce (_, TStruct struct_name), n) could be a reasonable solution, depending on wether the struct information really has no implication on the computation itself.
This would be at the cost of ① more convoluted match patterns (this is fine) and ② maybe some implicit structural invariants on the AST (which I am wary of)
Thanks @Altgr! I'll give my opinions below.
Types in expressions
You can get right of th types in ETupleAccess and EInj, they're an implementation artefact that was never a good idea. It's better to fetch the types in the context of type declarations. Concerning EAbs I would keep the type here because for closure conversion always having the types of argument is crucial. Sure when you build them you makr TAny at first but then type checking replaces those with the correct type.
Structs vs tuples
Let's get rid of tuples ; I'll just introduce new structs on demand for closure conversion.
Use inline records
Good idea, let's do it
@AltGr did you implement this? If so we can close the issue can't we?
Nope, not yet!