oxc icon indicating copy to clipboard operation
oxc copied to clipboard

check symbols and scopes after transformation

Open Boshen opened this issue 1 year ago • 3 comments

cc @overlookmotel how exactly do we want to check symbols and scopes?

Boshen avatar Aug 07 '24 09:08 Boshen

Process

I suggest the following process:

  1. Run parser on source to produce AST (AST v1).
  2. Run semantic on AST.
  3. Run transformer on AST.
  4. Store ScopeTree and SymbolTable from after transform (Semantic v1).
  5. Clone AST (making AST v2).
  6. ~~Traverse AST v2 and set all scope_id, symbol_id and reference_id fields to None. i.e. AST v2 is now returned to virgin state like it would have been if it had come fresh out of the parser.~~
  7. Run semantic on AST v2. Store the new ScopeTree and SymbolTable (Semantic v2).
  8. Compare scope_id, symbol_id and reference_id fields between AST 1 and AST 2. They should match.
  9. Compare ScopeTree + SymbolTable v1 and v2. They should also match.

i.e. AST v2 and ScopeTree + SymbolTable v2 are how they should be. Make sure that state of both AST and Semantic after transform (v1) matches that.

The complication

The tricky part is what constitutes "matches".

For example if this is the input:

if (x) enum Foo {}
function f() {}

The output of transformer is:

if (x) {}
function f() {}

The scope IDs are:

Before transform:

// Scope ID 0
if (x) enum Foo { /* Scope ID 1 */ }
function f() { /* Scope ID 2 */ }

After transform:

// Scope ID 0
if (x) { /* Scope ID 3 */ } // <-- newly created scope
function f() { /* Scope ID 2 */ }

vs fresh semantic run on post-transform AST:

// Scope ID 0
if (x) { /* Scope ID 1 */ } // <-- numbered 1 as it's 2nd scope found in visitation order
function f() { /* Scope ID 2 */ }

Scope IDs of the if {} block are different in the 2 versions, because in the post-transform version, that scope was newly created in transformer.

But we don't want to throw an error on this because they are equivalent.

Likely same kind of thing will happen with SymbolIds and ReferenceIds.

Suggested solution

Traverse AST v1 and AST v2 in tandem and:

  1. Check that every node which has a ScopeId in AST v1 also has one in AST v2, and vice versa.
  2. Build a hash map mapping from v1 ScopeId to v2 ScopeId. So, in example above v1 Scope ID 3 -> v2 Scope ID 1.

When comparing v1 ScopeTree to v2 ScopeTree, use the mapping to translate IDs before checking the scope trees match.

Ditto for SymbolIds and ReferenceIds.

Conclusion

So this is all a bit of a pain! But, if we can build this test infra, it will give us extremely good test coverage and we can be very confident that transformer is keeping scopes/symbols in sync correctly.

Side note: I'd imagine running these checks as part of transformer conformance, so we get instant feedback on PRs if they foul up scopes.

overlookmotel avatar Aug 08 '24 19:08 overlookmotel

I would be happy to take this is on if you like, Boshen.

overlookmotel avatar Aug 10 '24 11:08 overlookmotel

Just a note, we may need to check duplicate AstNodeId, ScopeId, and SymbolId in AST. related to https://github.com/oxc-project/oxc/issues/4809

Dunqing avatar Aug 10 '24 11:08 Dunqing

Borrowing the idea of the mangler, we can check whether symbols are the same in scope visitation order.

Boshen avatar Aug 12 '24 06:08 Boshen

Just a note, we may need to check duplicate AstNodeId, ScopeId, and SymbolId in AST. related to #4809

This feels more related to #4804

rzvxa avatar Aug 12 '24 10:08 rzvxa