capa
capa copied to clipboard
Control flow obfuscation causing basic block to be missed
Description
This issue may be for vivisect, but it appears that some control flow obfuscation is (I assume) causing a basic block to be missed. I've posted up my notes below for troubleshooting.
Steps to Reproduce
- Run Capa with the following rule, against the attached binary
329b3ddbf1c00b7767f0ec39b90eb9f4f8bd98ace60e2f6b6fbfb9adf25e3ef9.zip (Password
infected
)
rule:
meta:
name: foo
namespace: bar
scope: basic block
features:
- and:
- api: SystemParametersInfoW
Expected behavior:
Rule should trigger on the basic block located at 00406A84
Actual behavior: Does not trigger on the expected rule
Versions
Built from master
Additional Information
This is where results are inconsistent across tools.
According to Cutter:
Cutter will see the basic block and the call to SystemParametersInfoW
, cross referencing back shows the function starts at 0x004061d1
.
Searching for the import:
Checking xrefs
According to Ghidra:
Ghidra sees the import, but no xrefs.
Browsing to the function, it's easy to see where the disassembler becomes confused and parsing stops.
According to BinaryNinja:
BinaryNinja sees the beginning of the function as 0x004061d1
. Disassembly will eventually fall to the basic block that contains the API call.
The basic block:
i'll triage this, maybe on monday or tuesday. unfortunately, we're at the whim of the underlying analysis engine, so unless there's a quick fix, we may just have to accept it.
yesterday, i noticed a bunch of FNs due to viv not finding functions, too.
getting a ghidra plugin working is pretty high on my priority list (especially adding py3 ;-) ), and i know @psifertex is actively working on BN. so, we'll soon have a pretty comprehensive set of backends from which you can pick your favorite.
Yup! Seeing this maybe I'll prioritize the backend as opposed to the UI though the true CLI version of capa will only work via a commercial headless license. Using capa as a library in the BN Python repl would still work in personal, but I'm still wrapping my head around the current capa architecture a bit.
RE: Py3 and Ghidra, those are mutually exclusive though, no? I think jython still has no python3 support.
Thanks for the comments, looking forward to more analysis backends.
In Ghidra, Radare, and BinaryNinja the analysis level can be adjusted. Running aggressive function finders may result in some bad analysis, but it may also find functions that we not disassembled on the initial pass.
BinaryNinja does some neat stuff with opaque predicate removal and simplifying control flow in their lifting. If BN uses a lifted form of assembly, the only downside would be changes in mnemonics within signatures. (This is probably a topic for another thread)
Does vivisect support varying levels of analysis? If so, is this something that could be configured from Capa?
yeah, vivisect has a configurable set of analysis passes; however, i believe they're all enabled by default, so there's not a dial that we could turn further.
but, we could provide our own "aggressive function finder" when we initialize viv and maybe find more code. i think a function prologue scan would probably help, for instance.
in the sample originally provided by @re-fox, the function in question contains a good deal of anti-disassembly. for example:
(this is consistent with the ghdira screenshots above).
IDA is not able to tie the basic block back to a function, so I don't think the IDA Pro plugin would work either (we currently go, for each function: find capabilities
). i'm not sure how we'd want to handle this case.
originally, i did not read this issue close enough, and assumed viv was missing an obvious function pointer or something. but, that's not the case here.
Closing this as we cannot do much based on the analysis issues in various backends here.