Mayat icon indicating copy to clipboard operation
Mayat copied to clipboard

ADD Tree Transformation Pipeline

Open AlpacaMax opened this issue 2 years ago • 0 comments

A tree transformation stage after the parsing but before the execution of the plagiarism detection algorithm.

This stage solves two things:

  1. Parsers in tree-sitter actually generate CST, which has a lot of redundant nodes that confuse the algorithm. We can remove these redundant nodes at this stage and just leave nodes that actually contain syntactical meaning.
  2. We can also unify syntaxes. For example, we can change lambda functions into normal functions and change for loops into while loops. This is to prevent the corresponding code obfuscation techniques.

An important note is that the old parsers I used before the ones in tree-sitter actually do some of these automatically. So I'm essentially adding these features back to mayat. So before this issue is done, I recommend just using v1.0.0.

AlpacaMax avatar Mar 12 '23 03:03 AlpacaMax