capa icon indicating copy to clipboard operation
capa copied to clipboard

binary ninja: optimize feature extraction

Open williballenthin opened this issue 1 year ago • 15 comments

During some initial profiling, I'm finding that the Binary Ninja backend is substantially slower than vivisect or IDA. This thread will enumerate all the things we discover. It might include: bugs in Binary Ninja, things we're doing wrong, workarounds, etc.

Given how good Binary Ninja's code analysis is, we'd really like to be able to use it widely. So, let's prepare the code for this.

williballenthin avatar Sep 25 '24 11:09 williballenthin

https://github.com/Vector35/binaryninja-api/issues/5951

0953cc3b77ed2974b09e3a00708f88de931d681e2d0cb64afbaf714610beabe6 (100KB or so) takes a huge amount of time to load into Binary Ninja. Maybe there's an infinite loop somewhere.

williballenthin avatar Sep 25 '24 11:09 williballenthin

To run capa against 321338196a46b600ea330fc5d98d0699, it takes 2:48. But :36 is spent just in BNGetLowLevelILForInstruction to help recover calls to functions. I have expected this to be extremely fast (less than :01). Need to triage if this is a bug in the API or we're using it incorrectly and/or there's a workaround.

We can also see that BNGetFunctionMediumLevelIL takes quite a bit of time (:13, and its called twice).

  Austin  TUI   Wall Time Profile                                                                                         
   _________   Command python -m capa.main --json --backend=binja tests/data/321338196a46b600ea330fc5d98d0699.exe_                     
   ⎝__⎠ ⎝__⎠   Python 3.12.3    PID 3946467     PID:TID 3946467:0:3946467   
               Samples 1060897  ⏲️    2'48"       Threshold 1%   
  OWN    TOTAL    %OWN   %TOTAL  FUNCTION                                                                                              
  00"    2'09"     0.0%   76.6%  ├─ _run_module_as_main (<frozen runpy>:199)                                                           │
  00"    2'09"     0.0%   76.6%  │  └─ _run_code (<frozen runpy>:88)                                                                   │
  00"    2'08"     0.0%   76.3%  │     └─ <module> (~/code/public/capa/capa/main.py:1096)                                              │
  00"     14"      0.0%    8.4%  │        ├─ main (~/code/public/capa/capa/main.py:983)                                                │
  00"     14"      0.0%    8.4%  │        │  └─ get_extractor_from_cli (~/code/public/capa/capa/main.py:830)                           │
  00"     13"      0.0%    7.7%  │        │     ├─ get_extractor (~/code/public/capa/capa/loader.py:264)                               │
  00"     13"      0.0%    7.7%  │        │     │  └─ load (~/software/binaryninja/python/binaryninja/__init__.py:414)                 │
  00"     13"      0.0%    7.6%  │        │     │     └─ BinaryView.load (~/software/binaryninja/python/binaryninja/binaryview.py:2758)│
  13"     13"      7.6%    7.6%  │        │     │        └─ BNLoadFilename (~/software/binaryninja/python/binaryninja/_binaryninjacore.│
  00"    1'39"     0.0%   58.7%  │        ├─ main (~/code/public/capa/capa/main.py:987)                                                │
  00"    1'39"     0.0%   58.7%  │        │  └─ find_capabilities (~/code/public/capa/capa/capabilities/common.py:75)                  │
  00"    1'38"     0.0%   58.3%  │        │     ├─ find_static_capabilities (~/code/public/capa/capa/capabilities/static.py:184)       │
  00"     11"      0.0%    6.3%  │        │     │  ├─ find_code_capabilities (~/code/public/capa/capa/capabilities/static.py:117)      │
  00"     10"      0.0%    6.2%  │        │     │  │  ├─ BinjaFeatureExtractor.get_basic_blocks (~/code/public/capa/capa/features/extra│
  00"     10"      0.0%    6.0%  │        │     │  │  │  ├─ Function.mlil (~/software/binaryninja/python/binaryninja/function.py:1395) │
  10"     10"      6.0%    6.0%  │        │     │  │  │  │  └─ BNGetFunctionMediumLevelIL (~/software/binaryninja/python/binaryninja/_b│
  00"     48"      0.0%   28.9%  │        │     │  ├─ find_code_capabilities (~/code/public/capa/capa/capabilities/static.py:118)      │
  00"     08"      0.1%    4.6%  │        │     │  │  ├─ find_basic_block_capabilities (~/code/public/capa/capa/capabilities/static.py:│
  00"     07"      0.0%    4.2%  │        │     │  │  │  ├─ BinjaFeatureExtractor.get_instructions (~/code/public/capa/capa/features/ex│
  00"     02"      0.1%    1.4%  │        │     │  │  │  │  ├─ BasicBlock.__iter__ (~/software/binaryninja/python/binaryninja/basicbloc│
  00"     05"      0.0%    2.8%  │        │     │  │  │  │  ├─ BasicBlock.__iter__ (~/software/binaryninja/python/binaryninja/basicbloc│
  00"     02"      0.0%    1.5%  │        │     │  │  │  │  │  ├─ CoreArchitecture.get_instruction_text (~/software/binaryninja/python/│
  00"     35"      0.1%   20.8%  │        │     │  │  ├─ find_basic_block_capabilities (~/code/public/capa/capa/capabilities/static.py:│
  01"     31"      0.5%   18.6%  │        │     │  │  │  ├─ find_instruction_capabilities (~/code/public/capa/capa/capabilities/static.│
  00"     30"      0.0%   18.1%  │        │     │  │  │  │  ├─ BinjaFeatureExtractor.extract_insn_features (~/code/public/capa/capa/fea│
  01"     30"      0.8%   18.0%  │        │     │  │  │  │  │  ├─ extract_features (~/code/public/capa/capa/features/extractors/binja/i│
  00"     02"      0.0%    1.0%  │        │     │  │  │  │  │  │  ├─ extract_insn_number_features (~/code/public/capa/capa/features/ext│
  00"     02"      0.0%    1.1%  │        │     │  │  │  │  │  │  ├─ extract_insn_bytes_features (~/code/public/capa/capa/features/extr│
  00"     02"      0.0%    1.3%  │        │     │  │  │  │  │  │  ├─ extract_insn_api_features (~/code/public/capa/capa/features/extrac│
  00"     02"      0.0%    1.2%  │        │     │  │  │  │  │  │  │  ├─ is_stub_function (~/code/public/capa/capa/features/extractors/b│
  00"     02"      0.0%    1.0%  │        │     │  │  │  │  │  │  │  │  ├─ Function.llil (~/software/binaryninja/python/binaryninja/fun│
  02"     02"      1.0%    1.0%  │        │     │  │  │  │  │  │  │  │  │  ├─ BNGetFunctionLowLevelIL (~/software/binaryninja/python/bi│
  00"     03"      0.1%    2.1%  │        │     │  │  │  ├─ find_instruction_capabilities (~/code/public/capa/capa/capabilities/static.│
  00"     03"      0.0%    2.0%  │        │     │  │  │  │  ├─ RuleSet.match (~/code/public/capa/capa/rules/__init__.py:2053)          │
  02"     02"      1.4%    1.4%  │        │     │  │  │  │  │  ├─ RuleSet._match (~/code/public/capa/capa/rules/__init__.py:1879)      │
  00"     04"      0.0%    2.4%  │        │     │  │  ├─ find_basic_block_capabilities (~/code/public/capa/capa/capabilities/static.py:│
  00"     04"      0.0%    2.4%  │        │     │  │  │  ├─ RuleSet.match (~/code/public/capa/capa/rules/__init__.py:2053)             │
  00"     37"      0.0%   21.9%  │        │     │  ├─ find_code_capabilities (~/code/public/capa/capa/capabilities/static.py:128)      │
  00"     37"      0.0%   21.9%  │        │     │  │  ├─ BinjaFeatureExtractor.extract_function_features (~/code/public/capa/capa/featu│
  00"     37"      0.0%   21.9%  │        │     │  │  │  ├─ extract_features (~/code/public/capa/capa/features/extractors/binja/functio│
>>00"     36"      0.0%   21.5%  │        │     │  │  │  │  ├─ extract_function_calls_to (~/code/public/capa/capa/features/extractors/b│
>>00"     36"      0.0%   21.4%  │        │     │  │  │  │  │  ├─ ReferenceSource.llil (~/software/binaryninja/python/binaryninja/binar│
>>00"     36"      0.0%   21.3%  │        │     │  │  │  │  │  │  ├─ Function.get_low_level_il_at (~/software/binaryninja/python/binary│
>>36"     36"     21.3%   21.3%  │        │     │  │  │  │  │  │  │  └─ BNGetLowLevelILForInstruction (~/software/binaryninja/python/bi│
  00"     14"      0.0%    8.6%  │        ├─ main (~/code/public/capa/capa/main.py:990)                                                │
  00"     14"      0.0%    8.6%  │        │  └─ compute_layout (~/code/public/capa/capa/loader.py:662)                                 │
  00"     14"      0.0%    8.4%  │        │     ├─ compute_static_layout (~/code/public/capa/capa/loader.py:631)                       │
  00"     14"      0.0%    8.3%  │        │     │  ├─ BinjaFeatureExtractor.get_basic_blocks (~/code/public/capa/capa/features/extracto│
  00"     14"      0.0%    8.2%  │        │     │  │  ├─ Function.mlil (~/software/binaryninja/python/binaryninja/function.py:1395)    │
  14"     14"      8.2%    8.2%  │        │     │  │  │  ├─ BNGetFunctionMediumLevelIL (~/software/binaryninja/python/binaryninja/_bin

edit: maybe we can cache the results of fetching the llil/mlil to save some time. Still is surprising that it takes 3x longer to fetch the llil than do the complete analysis. Maybe its Python serialization overhead?

williballenthin avatar Sep 25 '24 11:09 williballenthin

To run capa against 321338196a46b600ea330fc5d98d0699, it takes 2:48. But :36 is spent just in BNGetLowLevelILForInstruction to help recover calls to functions. I have expected this to be extremely fast (less than :01). Need to triage if this is a bug in the API or we're using it incorrectly and/or there's a workaround.

We can also see that BNGetFunctionMediumLevelIL takes quite a bit of time (:13, and its called twice).

  Austin  TUI   Wall Time Profile                                                                                         
   _________   Command python -m capa.main --json --backend=binja tests/data/321338196a46b600ea330fc5d98d0699.exe_                     
   ⎝__⎠ ⎝__⎠   Python 3.12.3    PID 3946467     PID:TID 3946467:0:3946467   
               Samples 1060897  ⏲️    2'48"       Threshold 1%   
  OWN    TOTAL    %OWN   %TOTAL  FUNCTION                                                                                              
  00"    2'09"     0.0%   76.6%  ├─ _run_module_as_main (<frozen runpy>:199)                                                           │
  00"    2'09"     0.0%   76.6%  │  └─ _run_code (<frozen runpy>:88)                                                                   │
  00"    2'08"     0.0%   76.3%  │     └─ <module> (~/code/public/capa/capa/main.py:1096)                                              │
  00"     14"      0.0%    8.4%  │        ├─ main (~/code/public/capa/capa/main.py:983)                                                │
  00"     14"      0.0%    8.4%  │        │  └─ get_extractor_from_cli (~/code/public/capa/capa/main.py:830)                           │
  00"     13"      0.0%    7.7%  │        │     ├─ get_extractor (~/code/public/capa/capa/loader.py:264)                               │
  00"     13"      0.0%    7.7%  │        │     │  └─ load (~/software/binaryninja/python/binaryninja/__init__.py:414)                 │
  00"     13"      0.0%    7.6%  │        │     │     └─ BinaryView.load (~/software/binaryninja/python/binaryninja/binaryview.py:2758)│
  13"     13"      7.6%    7.6%  │        │     │        └─ BNLoadFilename (~/software/binaryninja/python/binaryninja/_binaryninjacore.│
  00"    1'39"     0.0%   58.7%  │        ├─ main (~/code/public/capa/capa/main.py:987)                                                │
  00"    1'39"     0.0%   58.7%  │        │  └─ find_capabilities (~/code/public/capa/capa/capabilities/common.py:75)                  │
  00"    1'38"     0.0%   58.3%  │        │     ├─ find_static_capabilities (~/code/public/capa/capa/capabilities/static.py:184)       │
  00"     11"      0.0%    6.3%  │        │     │  ├─ find_code_capabilities (~/code/public/capa/capa/capabilities/static.py:117)      │
  00"     10"      0.0%    6.2%  │        │     │  │  ├─ BinjaFeatureExtractor.get_basic_blocks (~/code/public/capa/capa/features/extra│
  00"     10"      0.0%    6.0%  │        │     │  │  │  ├─ Function.mlil (~/software/binaryninja/python/binaryninja/function.py:1395) │
  10"     10"      6.0%    6.0%  │        │     │  │  │  │  └─ BNGetFunctionMediumLevelIL (~/software/binaryninja/python/binaryninja/_b│
  00"     48"      0.0%   28.9%  │        │     │  ├─ find_code_capabilities (~/code/public/capa/capa/capabilities/static.py:118)      │
  00"     08"      0.1%    4.6%  │        │     │  │  ├─ find_basic_block_capabilities (~/code/public/capa/capa/capabilities/static.py:│
  00"     07"      0.0%    4.2%  │        │     │  │  │  ├─ BinjaFeatureExtractor.get_instructions (~/code/public/capa/capa/features/ex│
  00"     02"      0.1%    1.4%  │        │     │  │  │  │  ├─ BasicBlock.__iter__ (~/software/binaryninja/python/binaryninja/basicbloc│
  00"     05"      0.0%    2.8%  │        │     │  │  │  │  ├─ BasicBlock.__iter__ (~/software/binaryninja/python/binaryninja/basicbloc│
  00"     02"      0.0%    1.5%  │        │     │  │  │  │  │  ├─ CoreArchitecture.get_instruction_text (~/software/binaryninja/python/│
  00"     35"      0.1%   20.8%  │        │     │  │  ├─ find_basic_block_capabilities (~/code/public/capa/capa/capabilities/static.py:│
  01"     31"      0.5%   18.6%  │        │     │  │  │  ├─ find_instruction_capabilities (~/code/public/capa/capa/capabilities/static.│
  00"     30"      0.0%   18.1%  │        │     │  │  │  │  ├─ BinjaFeatureExtractor.extract_insn_features (~/code/public/capa/capa/fea│
  01"     30"      0.8%   18.0%  │        │     │  │  │  │  │  ├─ extract_features (~/code/public/capa/capa/features/extractors/binja/i│
  00"     02"      0.0%    1.0%  │        │     │  │  │  │  │  │  ├─ extract_insn_number_features (~/code/public/capa/capa/features/ext│
  00"     02"      0.0%    1.1%  │        │     │  │  │  │  │  │  ├─ extract_insn_bytes_features (~/code/public/capa/capa/features/extr│
  00"     02"      0.0%    1.3%  │        │     │  │  │  │  │  │  ├─ extract_insn_api_features (~/code/public/capa/capa/features/extrac│
  00"     02"      0.0%    1.2%  │        │     │  │  │  │  │  │  │  ├─ is_stub_function (~/code/public/capa/capa/features/extractors/b│
  00"     02"      0.0%    1.0%  │        │     │  │  │  │  │  │  │  │  ├─ Function.llil (~/software/binaryninja/python/binaryninja/fun│
  02"     02"      1.0%    1.0%  │        │     │  │  │  │  │  │  │  │  │  ├─ BNGetFunctionLowLevelIL (~/software/binaryninja/python/bi│
  00"     03"      0.1%    2.1%  │        │     │  │  │  ├─ find_instruction_capabilities (~/code/public/capa/capa/capabilities/static.│
  00"     03"      0.0%    2.0%  │        │     │  │  │  │  ├─ RuleSet.match (~/code/public/capa/capa/rules/__init__.py:2053)          │
  02"     02"      1.4%    1.4%  │        │     │  │  │  │  │  ├─ RuleSet._match (~/code/public/capa/capa/rules/__init__.py:1879)      │
  00"     04"      0.0%    2.4%  │        │     │  │  ├─ find_basic_block_capabilities (~/code/public/capa/capa/capabilities/static.py:│
  00"     04"      0.0%    2.4%  │        │     │  │  │  ├─ RuleSet.match (~/code/public/capa/capa/rules/__init__.py:2053)             │
  00"     37"      0.0%   21.9%  │        │     │  ├─ find_code_capabilities (~/code/public/capa/capa/capabilities/static.py:128)      │
  00"     37"      0.0%   21.9%  │        │     │  │  ├─ BinjaFeatureExtractor.extract_function_features (~/code/public/capa/capa/featu│
  00"     37"      0.0%   21.9%  │        │     │  │  │  ├─ extract_features (~/code/public/capa/capa/features/extractors/binja/functio│
>>00"     36"      0.0%   21.5%  │        │     │  │  │  │  ├─ extract_function_calls_to (~/code/public/capa/capa/features/extractors/b│
>>00"     36"      0.0%   21.4%  │        │     │  │  │  │  │  ├─ ReferenceSource.llil (~/software/binaryninja/python/binaryninja/binar│
>>00"     36"      0.0%   21.3%  │        │     │  │  │  │  │  │  ├─ Function.get_low_level_il_at (~/software/binaryninja/python/binary│
>>36"     36"     21.3%   21.3%  │        │     │  │  │  │  │  │  │  └─ BNGetLowLevelILForInstruction (~/software/binaryninja/python/bi│
  00"     14"      0.0%    8.6%  │        ├─ main (~/code/public/capa/capa/main.py:990)                                                │
  00"     14"      0.0%    8.6%  │        │  └─ compute_layout (~/code/public/capa/capa/loader.py:662)                                 │
  00"     14"      0.0%    8.4%  │        │     ├─ compute_static_layout (~/code/public/capa/capa/loader.py:631)                       │
  00"     14"      0.0%    8.3%  │        │     │  ├─ BinjaFeatureExtractor.get_basic_blocks (~/code/public/capa/capa/features/extracto│
  00"     14"      0.0%    8.2%  │        │     │  │  ├─ Function.mlil (~/software/binaryninja/python/binaryninja/function.py:1395)    │
  14"     14"      8.2%    8.2%  │        │     │  │  │  ├─ BNGetFunctionMediumLevelIL (~/software/binaryninja/python/binaryninja/_bin

edit: maybe we can cache the results of fetching the llil/mlil to save some time. Still is surprising that it takes 3x longer to fetch the llil than do the complete analysis. Maybe its Python serialization overhead?

I opened the file in binja GUI and the analysis only took 4.3 seconds:

[Analysis] Analysis update took 4.325 seconds

My machine is probably faster then the CI box used by GitHub, still quite surprising to see such a huge difference

xusheng6 avatar Sep 26 '24 14:09 xusheng6

@xusheng6 on my test rig it took maybe 13s to load the binary. Then lots longer to extract the features (minutes). So accessing the LLIR/MLIR is taking integer multiples of the total load time 😕

Maybe 3s vs 13s comes from only having about two cores available in the test environment.

williballenthin avatar Sep 26 '24 17:09 williballenthin

@xusheng6 on my test rig it took maybe 13s to load the binary. Then lots longer to extract the features (minutes). So accessing the LLIR/MLIR is taking integer multiples of the total load time 😕

Maybe 3s vs 13s comes from only having about two cores available in the test environment.

thx for letting me know about it, it seems either I wrote the backend in a bad way, or the Python wrapping adds significant overhead to it

xusheng6 avatar Sep 26 '24 17:09 xusheng6

The profiler didn't expose any invocation counts, so I'm not yet sure if we're calling the API way too many times or if the API itself is slow. Given that it's both LLIR and MLIR, I sorta suspect the latter. But, in the few minutes I looked at the bindings, it didn't seem like all that much was happening (on the Python side).

williballenthin avatar Sep 26 '24 18:09 williballenthin

While looking into https://github.com/mandiant/capa/issues/2406, I noticed the IL of the function 0x8082d40 (the largest function in b5f0524e69b3a3cf636c7ac366ca57bf5e3a8fdc8a9f01caf196c611a7918a87.elf_) is requested multiple times. This is not expected, since I thought the IL should be requested at most once and then cached. I will check how we cache the analysis data to ensure things are working as expected

xusheng6 avatar Nov 21 '24 13:11 xusheng6

While looking into #2406, I noticed the IL of the function 0x8082d40 (the largest function in b5f0524e69b3a3cf636c7ac366ca57bf5e3a8fdc8a9f01caf196c611a7918a87.elf_) is requested multiple times. This is not expected, since I thought the IL should be requested at most once and then cached. I will check how we cache the analysis data to ensure things are working as expected

Here is how we handle the caching of the IL function. On the one hand, we cannot afford to cache all of them because that can eat all RAM easily. On the other hand, we do want to cache some of them to avoid frequently regenerating them. Here is how are actually do it: we cache the IL of 64 functions. This number is governed by a setting:

image

A function's IL cache gets discarded when another function gets added to the cache bucket and evicts it from it.

This strategy apparently works very well for the UI usage scenario, but it is NOT very good for headless usage

xusheng6 avatar Nov 22 '24 04:11 xusheng6

ah, and from within capa, we go function by function (cache friendly) but request the IL from each caller and callee of the function (cache unfriendly).

In capa, we could maybe pre-compute some analysis in a function-major direction (ie. cache friendly way). I think it's possible but the code might become less intuitive.

williballenthin avatar Nov 22 '24 04:11 williballenthin

I would suggest you not do bother that from capa. I will see if I can do anything from binja.

Apparently the caching thing is unique to binja and it does not really make sense to put the burden on your shoulders

xusheng6 avatar Nov 22 '24 04:11 xusheng6

Here is a brief recap of my recent findings:

  1. Repetitive IL retrieval caused by the binja's cache handling of the IL function. See https://github.com/mandiant/capa/issues/2402#issuecomment-2492832367 for more info. This contributes to the performance issue
  2. Failure to retrieve the IL function when it should be retrievable. This used to contribute to many of the crashes in https://github.com/mandiant/capa/issues/2406. I have created an upstream issue for it: https://github.com/Vector35/binaryninja-api/issues/6171. Note, although the crash itself is fixed by the PR https://github.com/mandiant/capa/pull/2500, it is not a perfect fix. Since it only fixes the crash with a try/except, and the IL is still irretrievable for known reasons. This means while capa can now process these samples, it will miss some features and detection. I will continue to look into it
  3. These two issues could be related. In fact, I wrote a single script to reproduce 2) by iterating over the functions but failed. It might be the case that the repetitive retrieval of the IL somehow contributed to the failure, in a way not yet understood by me

@williballenthin for awareness

xusheng6 avatar Nov 22 '24 05:11 xusheng6

Ok I finally figured out what is happening: https://github.com/Vector35/binaryninja-api/issues/6171#issuecomment-2492926828. And indeed the two issues are related and they contribute together to what I see. That is the beauty of debugging!

xusheng6 avatar Nov 22 '24 05:11 xusheng6

@xusheng6 can you think of another way to phrase the check here: https://github.com/mandiant/capa/blob/7c01712843b8d365766e0979c2583ce8b7dd3e18/capa/features/extractors/binja/function.py#L27

Currently, we go:

for f in functions:
  for x in f.caller_sites:
    if x.il.operation in {...}:
      ...

and this is cache-unfriendly, due to the order that we access the function llil members.

Does Binja have another way that we could ask for the call graph?

Worst case, we could do something like:

for f in functions:
  for op in f.llil:
    if op.operation is "call":
      calls_to[op.address].add(f)
      calls_from[f].add(op.address)

And build the call graph ourselves in a single pass up front, which would be cache friendly (but require a complete pass through all the IL).

williballenthin avatar Nov 25 '24 20:11 williballenthin

also, @xusheng6 can we use LLIL instead of MLIL to recover basic blocks, such as here (https://github.com/mandiant/capa/blob/20909c1d9519b12ea0237a944e46691766bd7838/capa/features/extractors/binja/extractor.py#L58) so that we're using LLIL everywhere?

williballenthin avatar Nov 26 '24 14:11 williballenthin

also, @xusheng6 can we use LLIL instead of MLIL to recover basic blocks, such as here (

https://github.com/mandiant/capa/blob/20909c1d9519b12ea0237a944e46691766bd7838/capa/features/extractors/binja/extractor.py#L58 ) so that we're using LLIL everywhere?

I am using MLIL because the stack string detection is done with ease at MLIL. It is not used for basic block recovery

xusheng6 avatar Nov 26 '24 14:11 xusheng6