Support saving the extracted features to disk
I suggest that create a way to save (serialize) the extracted features to disk, and then load it and do the matching directly from there. It is useful in a few cases:
- Create unit test for the feature extractor, e.g., the binja extractor
- Separate the feature extractor and the matching process, e.g., for TTD, we might want to run some C++ code to do the feature extraction, save it, and then do the matching elsewhere
- Write the binja extractor in C++ which is more performant
We have the freeze format for that purpose, see https://github.com/mandiant/capa/tree/master/capa/features/freeze
Or did you have something else in mind?
We have the freeze format for that purpose, see https://github.com/mandiant/capa/tree/master/capa/features/freeze
Or did you have something else in mind?
Oh I did not see this. It looks promising!
I am curious whether it is easy to produce from a different language, e.g., C++, or is it a Python thing? I was considering something more universal like JSON etc, but I dunno how practical it is
capa freeze file format:
| capa0000 | + zlib(utf-8(json(...)))
it should be reasonably easy to produce from other languages. you'd need to do a little digging into the Pydantic data model to see how things are structured, but it is strictly Pydantic and declarative.