Add symbol name features for dynamically resolved function pointers
Summary
I want to be able to statically use the api feature for dynamically resolved function pointers, via GetProcAddress or dlsym.
Motivation
While working on https://github.com/mandiant/capa-rules/pull/1046, I ran into a limitation where I couldn't precisly examine calls to NtFsControlFile because that function is commonly dynamically resolved at runtime via GetProcAddress
(examples from 7d333c9b11b06ef0982b61bfc062631bb6cf9d12d0d4f2cf1b807a25ddf62fbc)
IDA has special handling for auto-renaming pointers in .data when they are set via GetProcAddress. It would be very useful to be able to have a similar capability in capa so we can precisely inspect calls through static capa executions to dynamically resolved functions.
Additionally, it would be an added bonus to have a way to explicitly look for (or not for) dynamically resolved functions. Maybe something like:
# only match on NtFsControlFile calls going to .data
- api: NtFsControlFile
- runtime_resolved: true
# only match on calls to CreateProcessA going to .idata
- api: CreateProcessA
- runtime_resolved: false
# if not specified, do both
- api: CloseHandle
This feature could be supported on Linux as well by looking at dlsym calls
Describe alternatives you've considered
The workaround I did in https://github.com/mandiant/capa-rules/pull/1046 was to just inspect how the arguments are set up for the NtFsControlFile calls we are interested in, and then look for a call. I'm not sure how accurate/precise this will be in practice though:
features:
- and:
- os: windows
- or:
- number: 0x11003c
- number: 0x110038
- number: 0x119ff8
- instruction:
- mnemonic: xor
- instruction:
- mnemonic: call
- not:
- characteristic: nzxor
Additional context
n/a
@mike-hunhoff Sir I am interested to take up this issue can you assign it to me ?
@zdwg42 can you give me a sample PE which does the dynamic loading like this
@Jinsakai-25 https://github.com/mandiant/capa-testfiles/blob/master/7d333c9b11b06ef0982b61bfc062631bb6cf9d12d0d4f2cf1b807a25ddf62fbc.exe_
@zdwg42 I hope these testfiles are not live malware ?
many of them are, handle them appropriately with care
@zdwg42 can you describe me how were you analyzing the PE in capa which commands you used . I don't think Capa extracts the api by itself it used vivisect or other IDA can also be used .
it did not extract the API name, that's the point of this featreq - i would like it to
you analysed the NtFsControlFile dynamic call in ghidra right ?
I think the rule you proposed - runtime_resolved: true can be implemented by a vivisect script of searching for the GetProcAddress api or LoadLibrary and seeing the parameter that is pushed . @williballenthin @mike-hunhoff What do you think ?
you analysed the NtFsControlFile dynamic call in ghidra right ?
those are IDA screenshots
capa using the vivisect backend has some limited special handling of runtime resolved APIs when they are called via register. we walk backwards from indirect calls and do a very naive register data flow tracking thing. there's nothing that analyzes global data read/writes for API pointers.
the easy case seems reasonable to do: for all functions, inspect the global data writes, and if a location is written to in exactly one place, and it's an API pointer, then reads to the location could propagate the API name.
beyond that, i think a more powerful analysis engine is probably needed, though maybe in limited cases we could add special handling.
Additionally, it would be an added bonus to have a way to explicitly look for (or not for) dynamically resolved functions.
i believe i understand the request though i'm not immediately recognizing the practical value. would you be able to express something that is really valuable and otherwise hard to encode?
my concern is that this would require extending the rule format quite a bit, so i'd want to consider this only if it would be widely used. if you can give some compelling examples i'd be happy to be otherwise convinced.
i have no immediate use case for that, i just thought if there was an easy way to expose that datapoint then it could be interesting for rule authors.
if its hard to add then probably just ignore that bit then
we could perhaps add a new characteristic feature and emit it at the same address at instruction scope, so you could do:
instruction:
- api: CreateFile
- characteristic: runtime resolved
which is actually not far off what you proposed!
@williballenthin Sir , can you assign me this work ? Describe me the necessary changes in detail and any prerequisites knowledge I need to know .
@Jinsakai-25 there's a lot of context in this thread already. i won't restate it again. if you have specific questions, happy to discuss here.