solang Build the AST before running semantic analysis

Currently, Solang checks the semantic correctness of the code while building the AST, i.e. it happens during a parse tree traversal. In some cases, we are not able to infer the correct semantic problem for following this approach. Issue #779 is an example for this. We can improve the semantic analysis by building an AST from a parse tree and then traversing.

Advantages

More realistic error messages.
We have more context to make inferences about the semantic information.
We can better handle a parse tree containing error nodes.

Caveats

I couldn't think of any yet 🤔

Challenges

This involves a tremendous amount of work.

Jul 15 '22 18:07 LucasSte

Please, @seanyoung and @xermicus, join the discussion!

Jul 15 '22 18:07 LucasSte

I was thinking about this last night. So, the idea is to have some data structure which you then afterwards "decorate" with resolved symbols and so on.

Currently the ast is a simple datastructure. This would have to become a lot more complicated with lots of Option fields which are None by default
A function call can be call to a builtin, a cast, call to member, etc. So one node will have to have multiple "decorated forms". This is way more complicated than the ast data structure we have right now
I thought that traditionally the parser produces a tree which is then decorated. So the ast is really the parse tree with extra attributes (i.e. decorations). If we go down this road then the parse tree would have to be suitable for this
The touted advantages of a new structure is that we can do multiple passes over the ast. If we want, we add more passes now, do we really need a new data structure for that?

Currently I think that this issue needs is a suggestion of what that data structure would look like. I suspect that the complexity is just not worth it.

Jul 16 '22 07:07 seanyoung

I think that ideally we would be able to walk the AST multiple times. It would bring improvements beyond better error handling/messages:

Semantic checks could be split up (e.g. symbol table building, type checking, ...)
Each semantic check could yield some dedicated output (e.g. a symbol table) that could be used throughout the following processing.
This would mean that the AST never changes
Some advanced checks that would be very easy with multiple AST walks are difficult to realize otherwise.

Overall I think such a change would be very beneficial in the long run.

So, the idea is to have some data structure which you then afterwards "decorate" with resolved symbols and so on

I agree that this would introduce a lot of complexity, though I'm not sure whether this would be the ideal approach? I think that the visitor pattern is a very common approach to solve the complexity issue, and is a popular choice for our case.

My opinion here is that the effort (and I agree to that this would need a lot of work) could be worth in the long run.

Jul 18 '22 14:07 xermicus

To get a better estimate on whether the effort would be worth it or, I think we should track any issue that would benefit from a refactor here (like #779). Additionally we could perform a quick assessment of the current code base to get an idea how much room for improvement there is.

Jul 18 '22 16:07 xermicus