capa icon indicating copy to clipboard operation
capa copied to clipboard

how to bundle TreeSitter bindings

Open williballenthin opened this issue 2 years ago • 1 comments

Originally posted by @williballenthin in https://github.com/mandiant/capa/pull/1080#discussion_r912047439

ideally, we want to be able to install capa simply by doing pip install flare-capa and/or fetching the standalone executable from github (generated via pyinstaller). this means our dependencies should live within the python ecosystem.

there is a supported TreeSitter library for Python; however, it doesn't include the bindings for each language we parse with TreeSitter. these bindings must be compiled into shared objects and distributed for use with the TreeSitter library.

we need to figure out how to distribute the shared object code with capa so that it "just works".

williballenthin avatar Jul 06 '22 19:07 williballenthin

one strategy:

Rust has good TreeSitter library support and can statically link language bindings. Rust also has great Python binding support via PyO3, which is how we distribute our implementation of FLIRT to all supported platforms (windows/mac/linux * 32/64bits).

we could build a Python package implemented as a native library via Rust+PyO3 and distributed on PyPI that embeds the TreeSitter library and all bindings.

pro:

  • distribute whls via pypi for all supported platforms
  • we have existing code for this here: https://github.com/williballenthin/lancelot/tree/master/pyflirt

con:

  • have to write code to wrap the TS APIs that we want
  • yet another github repository to maintain (though it can easily have CI/CD, too)

williballenthin avatar Jul 06 '22 21:07 williballenthin