Mayat
Mayat copied to clipboard
ADD Tree Transformation Pipeline
A tree transformation stage after the parsing but before the execution of the plagiarism detection algorithm.
This stage solves two things:
- Parsers in tree-sitter actually generate CST, which has a lot of redundant nodes that confuse the algorithm. We can remove these redundant nodes at this stage and just leave nodes that actually contain syntactical meaning.
- We can also unify syntaxes. For example, we can change lambda functions into normal functions and change for loops into while loops. This is to prevent the corresponding code obfuscation techniques.
An important note is that the old parsers I used before the ones in tree-sitter actually do some of these automatically. So I'm essentially adding these features back to mayat. So before this issue is done, I recommend just using v1.0.0.