Add a script to clean up pseudocode generated by common decompilers to improve Semgrep parsing

Open 0xdea opened this issue 2 months ago • 1 comments

Remove IDA decorators that have shown to cause issues with Semgrep parser:

virtual thunk to
non-virtual thunk to
vtable for
typeinfo for
guard variable for
VTT for

See also: semgrep-vs-decompiler.zip

Ghidra has some decorators which we want to remove from our code files:

__thiscall
__cdecl
__noreturn
__fastcall

[TODO: check if they're indeed problematic and if there are others, consider opening an issue/PR with Semgrep]

Nov 13 '25 15:11 0xdea

Also, handle try/catch/throw construct (IDA) and possibly other C++ stuff (Ghidra, other decompilers) by changing pseudocode file extension to .cpp where appropriate.

EDIT: try scanning the same pseudocode with .c and .cpp extensions with my ruleset, and see what changes. It might be a practical solution to export everything as .cpp regardless of its content, based on how my rules and Semgrep (seem to) work.

Nov 14 '25 08:11 0xdea