capa dotnet: add support for basic blocks

Add support to calculate IL basic blocks, enabling basic block scope and loop detection for .NET files.

Feb 24 '23 21:02 mike-hunhoff

@mr-tz / @williballenthin before I dig in here any thoughts as to why tests would only be failing on Python 3.7?

Feb 27 '23 16:02 mike-hunhoff

Nope, but looks like something with pydantic...

Feb 27 '23 18:02 mr-tz

https://github.com/mandiant/capa/blob/master/capa/features/freeze/init.py#L49

this line recently changed but i have no idea why it would fail on 3.7. i'll try to reproduce locally in a 3.7 virtualenv and see whats going on.

Feb 27 '23 18:02 ghost

Most recent push improved how we emit basic block matches. Below we see basic block matches where the method token + offset is used for both the matched basic block and matched instruction in the basic block. This allows a user to quickly identify the exact basic block and instruction matched in a given method. Screen Shot 2023-02-27 at 12 08 13 PM

Feb 27 '23 19:02 mike-hunhoff

@williballenthin looks like tests are passing after my latest commit.....very weird the tests were only failing on Python 3.7. I noticed there was a bug manually which led to the fix but I'd rather this have been picked up by our testing suite.

Feb 27 '23 19:02 mike-hunhoff

definitely agree. I'll still try to reproduce and add tests or more assertions to better catch the scenario.

Feb 27 '23 20:02 ghost

I like the rendering in the screenshot you shared - lots of detail and the important stuff is highlighted.

Do you think we should also try to include the function name, when possible? Using tokens is very precise and I don't think this should go away; however, I wonder if including the function name might give more context (especially when the names aren't obfuscated).

Feb 27 '23 20:02 ghost

I like the rendering in the screenshot you shared - lots of detail and the important stuff is highlighted. Do you think we should also try to include the function name, when possible? Using tokens is very precise and I don't think this should go away; however, I wonder if including the function name might give more context (especially when the names aren't obfuscated).

I'm honestly torn here. I think displaying un-obfuscated method names is valuable, however, my work has seen few samples where this is the case. More frequently, I see large, obfuscated names that would be a pain to handle (display) correctly.

We could take a route simliar to de4dot by attempting to identify and rename obfuscated method names but I see that potentially confusing users who pivot from capa to a tool like dnSpy.

Reading your original message I see you say "when possible" which could be a valid path forward, e.g., we only display method names when we can identify the name has not been obfuscated at all (or past an established threshold).

Feb 28 '23 22:02 mike-hunhoff

Nice work. A few comments and maybe a larger question if this should maybe life in dncil or dnfile?

@mr-tz This came to mind about halfway through writing the code haha. I think the basic block logic likely belongs in dncil. Once we've addressed any issues here I'll move the basic block logic to dncil and update this PR to use the updated dncil APIs.

Feb 28 '23 22:02 mike-hunhoff

Nice work. A few comments and maybe a larger question if this should maybe life in dncil or dnfile?

@mr-tz This came to mind about halfway through writing the code haha. I think the basic block logic likely belongs in dncil. Once we've addressed any issues here I'll move the basic block logic to dncil and update this PR to use the updated dncil APIs.

@mr-tz @williballenthin see https://github.com/mandiant/dncil/pull/55

Mar 01 '23 00:03 mike-hunhoff

I'm honestly torn here. I think displaying un-obfuscated method names is valuable, however, my work has seen few samples where this is the case. More frequently, I see large, obfuscated names that would be a pain to handle (display) correctly.

We could take a route simliar to de4dot by attempting to identify and rename obfuscated method names but I see that potentially confusing users who pivot from capa to a tool like dnSpy.

Reading your original message I see you say "when possible" which could be a valid path forward, e.g., we only display method names when we can identify the name has not been obfuscated at all (or past an established threshold).

I agree with everything you say here. I'd only recommend including the name when its useful and helpful. I do not think we should do any renaming or show any obfuscated names. So, if its reasonably easy to determine if a name is non-obfuscated, then I think we should maybe do that.

How to determine if non-obfuscated? I guess ASCII only, something around entropy, not mixing casing or numbers too weirdly, ...?

We should probably move this into a separate feature request.

Mar 01 '23 12:03 williballenthin

@mike-hunhoff, do we want to keep this around here or close it?

Mar 22 '24 10:03 mr-tz

@mike-hunhoff, do we want to keep this around here or close it?

We can close - I'll link the corresponding dncil PR back to this for documentation purposes.

Mar 22 '24 15:03 mike-hunhoff