capa icon indicating copy to clipboard operation
capa copied to clipboard

add support for analysis of source code/scripted languages

Open adamstorek opened this issue 2 years ago • 2 comments

This enhancement extends capa's functionality to the analysis of potentially malicious scripts and source code. A tree-sitter backend was added to parse the source files into a lightweight AST. Features akin to the PE-Vivisect capa are then extracted:

File-level:

  • trivial: language, file format
  • global string literals
  • global integer literals
  • namespaces
  • globally-instantiated imported classes
  • globally-called imported functions

Function-level:

  • string literals
  • integer literals
  • imported classes
  • imported functions

To install Tree-sitter:

  1. Pip-install Tree-sitter: pip3 install tree-sitter
  2. Install bindings: mkdir vendor build cd vendor git clone [email protected]:tree-sitter/tree-sitter-c-sharp.git git clone [email protected]:tree-sitter/tree-sitter-embedded-template.git git clone [email protected]:tree-sitter/tree-sitter-html.git git clone [email protected]:tree-sitter/tree-sitter-javascript.git

Checklist

  • [ ] No CHANGELOG update needed
  • [ ] No new tests needed
  • [ ] No documentation update needed

adamstorek avatar Jul 01 '22 12:07 adamstorek

i think it would be worthwhile to get the tests running (and passing) in CI. this means:

  • add the example files to capa-testfiles and get those merged, and
  • update the github actions workflows to install the TS bindings (temporarily, until we have a better solution)

williballenthin avatar Jul 06 '22 21:07 williballenthin

  • add the example files to capa-testfiles and get those merged, and

Just submitted the pull request pull request.

  • update the github actions workflows to install the TS bindings (temporarily, until we have a better solution)

On it.

adamstorek avatar Jul 07 '22 14:07 adamstorek