catala icon indicating copy to clipboard operation
catala copied to clipboard

Discussion for AST simplification

Open AltGr opened this issue 3 years ago • 7 comments
trafficstars

The summer has seen a thourough code reorganisation, with the unification of shared code and types for different passes under the shared_ast sub-library.

However, the AST structures themselves have been left unchanged, if defined in a different way (one thing at a time!) . For example, while the type for tuples and structures is now separated, but the underlying expressions still fall into the same EStruct case with an option for the structure name.

This is to open a thread for discussion on the changes that we want or not before actually implementing the changes. I'll write different aspects in different comments.

AltGr avatar Sep 12 '22 10:09 AltGr

Types in expressions

Some expressions contain types. This is mandatory, since type constraints may come from the syntax and need to be passed through. This concerns:

  • EAbs, with the type of the function arguments
  • ETupleAccessand EInj with the expanded type of the underlying structure/enum as a list

For the first case, this is required from scope definitions. However, lots of functions are generated by the compiler (e.g. translation of let-ins) and use TAny, which feels a bit inconsistent.

For the second cases, this encodes a name resolution for the struct/enum that is done early, and has consequences for type infeence. In the case of enums, EnumName.t * typ list is clearly redundant (since we now assume access to the context — we keep the same assumptions on the consistency of the enum constructor name ordering) and can be simplified to just the enum name. The same could be true for named enums, but not for tuples which I'll discuss in another comment.

Proposition #1

Getting back to the case of EAbs, would it be cleaner to remove the type from the construct but add a new ECoerce of expr * ty case to pass the type information from the scopes ?

AltGr avatar Sep 12 '22 11:09 AltGr

Structs vs tuples

We now have:

  • for ScopeLang:
    | EStruct of StructName.t * expr StructFieldMap.t
    | EStructAcess of expr * StructFieldName.t * StructName.t
    
  • for Dcalc
    | ETuple of expr list * StructName.t option
    | ETupleAccess of expr * int * StructName.t option * typ list
    
    

The change from named fields to indices in a list is a meaningful compilation step, and similar to what happens on enums (which change from EEnumInj / EMatchS to EInj / EMatch); we probably want to keep it¹. However, the options in the Dcalc case are a bit cumbersome — esp. when we consider that the typ list should not be useful when the struct name is present.

The simple envisionned solution is to duplicate the fields, like was done for the types:

| ETuple of expr list
| ETupleAccess of expr * int * typ list
| EStruct of expr list * StructName.t
| EStructAccess of expr * int * StructName.t

This doesn't feel completely satisfactory though: it adds redundance, while computationally the struct and tuple cases remain equivalent. Moreover — at the moment — tuples are only ever used for closure conversion, where the types are actually filled in with TAny placeholders !

Removing the struct name options and using tuples everywhere isn't a proper solution either because we do need those for meaningful printing and errors.

With all this in mind, I am unsure that the simple solution would be a clear improvement. But is there a clearly better solution than the current statu quo ? We could just remove the typ list but that would prevent later re-typing, so it should be filled with correct information instead. Or could we rely exclusively on structs, but have closure conversion add many definitions of custom structs to the environment ?

If we don't intend to expand the use of tuples, it'd also be possible to add them, but only to lcalc ast and not to dcalc…

What do you think ?

¹ Although some backends (e.g. OCaml), to generate more readable code, roll back from indexed tuples to records using the extra information about original structs…

AltGr avatar Sep 12 '22 12:09 AltGr

Use inline records

I think we have consensus on this one, but it's mostly cosmetic and not very high priority (a bit tedious to write too). Replace e.g.

| EDefault of expr list * expr * expr

by

| EDefault of { exceptions: expr list; just: expr; cons: expr }

This avoids confusion between field parameters of the same type, softly enforces consistent naming of these across pattern-matchings throughout, and allows more concise matching of a subset of parameters.

AltGr avatar Sep 12 '22 12:09 AltGr

Structs vs tuples

Following proposition #1, encoding structs as ECoerce (ETuple (_), TStruct struct_name) and ETupleAccess (ECoerce (_, TStruct struct_name), n) could be a reasonable solution, depending on wether the struct information really has no implication on the computation itself.

This would be at the cost of ① more convoluted match patterns (this is fine) and ② maybe some implicit structural invariants on the AST (which I am wary of)

AltGr avatar Sep 12 '22 12:09 AltGr

Thanks @Altgr! I'll give my opinions below.

Types in expressions

You can get right of th types in ETupleAccess and EInj, they're an implementation artefact that was never a good idea. It's better to fetch the types in the context of type declarations. Concerning EAbs I would keep the type here because for closure conversion always having the types of argument is crucial. Sure when you build them you makr TAny at first but then type checking replaces those with the correct type.

Structs vs tuples

Let's get rid of tuples ; I'll just introduce new structs on demand for closure conversion.

Use inline records

Good idea, let's do it

denismerigoux avatar Sep 13 '22 09:09 denismerigoux

@AltGr did you implement this? If so we can close the issue can't we?

denismerigoux avatar Oct 03 '22 13:10 denismerigoux

Nope, not yet!

AltGr avatar Oct 03 '22 13:10 AltGr