capa icon indicating copy to clipboard operation
capa copied to clipboard

Add symbol name features for dynamically resolved function pointers

Open zdwg42 opened this issue 7 months ago • 16 comments

Summary

I want to be able to statically use the api feature for dynamically resolved function pointers, via GetProcAddress or dlsym.

Motivation

While working on https://github.com/mandiant/capa-rules/pull/1046, I ran into a limitation where I couldn't precisly examine calls to NtFsControlFile because that function is commonly dynamically resolved at runtime via GetProcAddress

(examples from 7d333c9b11b06ef0982b61bfc062631bb6cf9d12d0d4f2cf1b807a25ddf62fbc)

Image

Image

IDA has special handling for auto-renaming pointers in .data when they are set via GetProcAddress. It would be very useful to be able to have a similar capability in capa so we can precisely inspect calls through static capa executions to dynamically resolved functions.

Additionally, it would be an added bonus to have a way to explicitly look for (or not for) dynamically resolved functions. Maybe something like:

# only match on NtFsControlFile calls going to .data
- api: NtFsControlFile
  - runtime_resolved: true

# only match on calls to CreateProcessA going to .idata
- api: CreateProcessA
  - runtime_resolved: false

# if not specified, do both
- api: CloseHandle

This feature could be supported on Linux as well by looking at dlsym calls

Describe alternatives you've considered

The workaround I did in https://github.com/mandiant/capa-rules/pull/1046 was to just inspect how the arguments are set up for the NtFsControlFile calls we are interested in, and then look for a call. I'm not sure how accurate/precise this will be in practice though:

  features:
    - and:
      - os: windows
      - or:
        - number: 0x11003c
        - number: 0x110038
        - number: 0x119ff8
      - instruction:
        - mnemonic: xor
      - instruction:
        - mnemonic: call
      - not:
        - characteristic: nzxor

Additional context

n/a

zdwg42 avatar May 13 '25 13:05 zdwg42

@mike-hunhoff Sir I am interested to take up this issue can you assign it to me ?

Jinsakai-25 avatar Sep 24 '25 16:09 Jinsakai-25

@zdwg42 can you give me a sample PE which does the dynamic loading like this

Jinsakai-25 avatar Sep 24 '25 17:09 Jinsakai-25

@Jinsakai-25 https://github.com/mandiant/capa-testfiles/blob/master/7d333c9b11b06ef0982b61bfc062631bb6cf9d12d0d4f2cf1b807a25ddf62fbc.exe_

zdwg42 avatar Sep 24 '25 17:09 zdwg42

@zdwg42 I hope these testfiles are not live malware ?

Jinsakai-25 avatar Sep 24 '25 17:09 Jinsakai-25

many of them are, handle them appropriately with care

zdwg42 avatar Sep 24 '25 17:09 zdwg42

@zdwg42 can you describe me how were you analyzing the PE in capa which commands you used . I don't think Capa extracts the api by itself it used vivisect or other IDA can also be used .

Jinsakai-25 avatar Sep 24 '25 19:09 Jinsakai-25

it did not extract the API name, that's the point of this featreq - i would like it to

zdwg42 avatar Sep 24 '25 19:09 zdwg42

you analysed the NtFsControlFile dynamic call in ghidra right ?

Jinsakai-25 avatar Sep 24 '25 19:09 Jinsakai-25

I think the rule you proposed - runtime_resolved: true can be implemented by a vivisect script of searching for the GetProcAddress api or LoadLibrary and seeing the parameter that is pushed . @williballenthin @mike-hunhoff What do you think ?

Jinsakai-25 avatar Sep 24 '25 20:09 Jinsakai-25

you analysed the NtFsControlFile dynamic call in ghidra right ?

those are IDA screenshots

zdwg42 avatar Sep 24 '25 20:09 zdwg42

capa using the vivisect backend has some limited special handling of runtime resolved APIs when they are called via register. we walk backwards from indirect calls and do a very naive register data flow tracking thing. there's nothing that analyzes global data read/writes for API pointers.

the easy case seems reasonable to do: for all functions, inspect the global data writes, and if a location is written to in exactly one place, and it's an API pointer, then reads to the location could propagate the API name.

beyond that, i think a more powerful analysis engine is probably needed, though maybe in limited cases we could add special handling.

williballenthin avatar Sep 24 '25 20:09 williballenthin

Additionally, it would be an added bonus to have a way to explicitly look for (or not for) dynamically resolved functions.

i believe i understand the request though i'm not immediately recognizing the practical value. would you be able to express something that is really valuable and otherwise hard to encode?

my concern is that this would require extending the rule format quite a bit, so i'd want to consider this only if it would be widely used. if you can give some compelling examples i'd be happy to be otherwise convinced.

williballenthin avatar Sep 24 '25 20:09 williballenthin

i have no immediate use case for that, i just thought if there was an easy way to expose that datapoint then it could be interesting for rule authors.

if its hard to add then probably just ignore that bit then

zdwg42 avatar Sep 24 '25 20:09 zdwg42

we could perhaps add a new characteristic feature and emit it at the same address at instruction scope, so you could do:

instruction:
  - api: CreateFile
  - characteristic: runtime resolved

which is actually not far off what you proposed!

williballenthin avatar Sep 24 '25 21:09 williballenthin

@williballenthin Sir , can you assign me this work ? Describe me the necessary changes in detail and any prerequisites knowledge I need to know .

Jinsakai-25 avatar Sep 25 '25 04:09 Jinsakai-25

@Jinsakai-25 there's a lot of context in this thread already. i won't restate it again. if you have specific questions, happy to discuss here.

williballenthin avatar Sep 25 '25 09:09 williballenthin