Regression: crash in compute_static_layout when using binja extractor
I got this crash today when I analyze 2f7f5fb5de175e770d7eae87666f9831.elf_. Note -- https://github.com/mandiant/capa/pull/2732 must be applied first or otherwise you will get a crash before getting to compute_static_layout
Traceback (most recent call last):
File "/Users/xusheng/capa-env/bin/capa", line 7, in <module>
sys.exit(main())
^^^^^^
File "/Users/xusheng/capa/capa/main.py", line 1042, in main
meta.analysis.layout = capa.loader.compute_layout(rules, extractor, capabilities.matches)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/xusheng/capa/capa/loader.py", line 675, in compute_layout
return compute_static_layout(rules, extractor, capabilities)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/xusheng/capa/capa/loader.py", line 653, in compute_static_layout
assert addr in functions_by_bb
^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
This issue is long known, it was first reported in https://github.com/mandiant/capa/issues/2406#issuecomment-2490171179. I then tracked down the root cause https://github.com/mandiant/capa/issues/2516, which is then fixed by https://github.com/mandiant/capa/pull/2523. This also led to the creation of a binja issue: https://github.com/Vector35/binaryninja-api/issues/6222
It is unclear to me what is causing the regression
@mr-tz can you briefly tell me what layout is computed?
This data links basic blocks to the functions in which they're found. So we can do a mapping later on between those.
Seems like there's a matched BB that's not previously discovered?
It is actually a very delicate bug. I analyzed it previously and put some notes here: https://github.com/mandiant/capa/issues/2406#issuecomment-2498173743. However, I am not sure if we actually have a regression on the fix, or something else is happening.
The observed behavior is that when we do the feature extractions, we discover all of the basic blocks correct, so the feature shows up in the capabilities. However, as we move to get the basic blocks again, the state of the analysis has changed, and it caused some issues (which I do not understand yet) which eventually lead to fewer basic blocks to be discovered in the function
We are approaching a new release so I cannot guarantee that I would be able to get this fixed soon, though I will definitely try to
Please free feel to ping me if this gets inactive for a while