nimskull icon indicating copy to clipboard operation
nimskull copied to clipboard

WIP: turn macro output into untyped AST

Open zerbina opened this issue 1 year ago • 3 comments

This PR implements a transformation pass that turns untrusted AST as output from a macro into untyped AST. In this context, "untyped" currently means that the resulting AST:

  • has no type information
  • has no node flags
  • apart from nkSym, nkOpenSymChoice, and nkClosedSymChoice contains only nodes that are output by the parser
  • only contains syntax that is valid according to the grammar

The translation tries to approximate constructs present in typed AST with their untyped AST counterpart where possible. For example a Conv (Type "int") (Sym "x") is turned into Call (Sym "int") (Sym "x").

For each input symbol node, it is validated that the symbol is in scope (reachable from the current scope). If it is, the symbol is also used in the output AST -- it's turned into an identifier node otherwise, so that semantic analysis can decide what to do with it.

The overall goal is to move syntax and grammar checking out of the semantic analysis procedures by preventing ill-formed AST from ever reaching them. This should simplify sem a bit, as it can now focus on semantic analysis and no longer has to also validate the structure of the AST (separation of concerns). Illformed AST error can now also be encoded as nkError nodes and the checkSonsLen procedure becomes obsolete.

Removing the syntax/grammar checks from sem is not done as part of this PR and should happen as a follow up.

Details

For the symbol reachability validation, it is first checked if the symbol is present in the current scope (i.e. the one where the macro is expanded) or one of the scopes enclosing it. Then, it is tested whether the symbol is part of the top-level scope of the module it belongs to. If that's also not the case, the symbol is treated as not reachable. Since instantiated generics are not added to the symbol table, their symbols are never treated as reachable, and thus turned into identifier by the sanitizer.


As part of fixing the issues that are blocking the MIR, I've looked a bit more into sem recently, and one thing that I identified as adding a significant amount of complexity is that due to macros, sem cannot trust the AST it operates on. This inspired me to look into solutions, which lead me to experimenting with this sanitizer layer.

Notes for Reviewers

  • this a prototype for the most part
  • the compiler is able to bootstrap, but a good amount of constructs are not supported yet
  • gensym handling is not figured out yet

zerbina avatar Jan 02 '23 00:01 zerbina