dnfile icon indicating copy to clipboard operation
dnfile copied to clipboard

parse method header and sections

Open malwarefrank opened this issue 3 years ago • 7 comments

Parse the Method data (pointed to by RVA, see mdtable.MethodDefRow), as much as is needed to perform data-agnostic computation over the bytecode (cryptographic and fuzzy hashes, entropy, value distributions, etc).

See ECMA-335 6th Edition, Section II.25.4 Common Intermediate Language physical layout

malwarefrank avatar Jan 09 '22 03:01 malwarefrank

@malwarefrank I've been working on a Python library that parses method body sections and CIL instructions using RVAs recovered by dnfile. Is there any interest in adding this level of method body parsing directly to dnfile? You mention parsing the sections but not the CIL instructions.

mike-hunhoff avatar Feb 09 '22 16:02 mike-hunhoff

sorry for the delayed response. I would love to see what you are working on. I am uncertain whether bytecode disassembly should be separate from dnfile or a part of it.

I would not want to succumb to scope creep too much before milestone 1.0, but thinking through some of those details may help to inform the API before I lock out breaking changes

malwarefrank avatar Feb 21 '22 03:02 malwarefrank

Apologies for the delayed response. I've released the work I've been doing on CIL disassembly here: https://github.com/mandiant/dncil. The library supports parsing method body headers, instructions, and exception handlers. There is an example of using dnfile and dncil together here: https://github.com/mandiant/dncil/blob/main/scripts/print_cil_from_dn_file.py.

One option to consider is dnfile including dncil as a dependency to keep core functionalities isolated and easier to maintain. dnfile could leverage dncil to parse method bodies and we could make disassembly a configurable option. I think dnfile including dncil as a dependency makes the most sense as both projects have progressed.

I'd be happy to contribute code to make dnfile and dncil work together but understand if this is outside the scope of your vision for dnfile.

mike-hunhoff avatar Mar 30 '22 21:03 mike-hunhoff

Thanks. I started looking at the dncli code and realized that there have been some helpful changes since dnfile v0.8.0, including a user_strings shortcut. I tagged master and pushed a new version to pypi.

I will look at the dncli code more and think through how to best integrate.

malwarefrank avatar Mar 31 '22 02:03 malwarefrank

I like that dncil focuses on parsing the method bodies. I still want to parse the MetadataTable MethodDef rows into a list of objects in dnfile and make accessible via a shortcut, maybe something like

import dnfile
pe = dnfile.dnPE("filename.exe")
if pe and pe.net:
    for method in pe.net.methods:
        # do something

I think I can do that without replicating any of the dncil method body parsing. I should be able to use or cherrypick from @williballenthin work on #37 for method signature and param signature parsing for some if not most of it. Then I can just import and call dncil for the method body parsing.

malwarefrank avatar Apr 23 '22 03:04 malwarefrank

This sounds great. Please reach out if you have any questions or issues w/ dncil.

mike-hunhoff avatar May 02 '22 16:05 mike-hunhoff

Parsing methods is complicated! </rant>

The MethodDef row defines a parameter list, but then the associated method signature also defines parameters. I am working on this in the methods branch, but suffice to say it will be a while longer.

malwarefrank avatar Aug 28 '22 03:08 malwarefrank