miasm icon indicating copy to clipboard operation
miasm copied to clipboard

PE: import/export names are not decoded to strings

Open williballenthin opened this issue 5 years ago • 3 comments

As discussed offline, miasm.loader.pe does not decode bytes into strings for import and export entries, such as pe.executable.DirImport.impdesc[0].dlldescname.name.

williballenthin avatar Jan 10 '20 17:01 williballenthin

If you'd like me to take a stab at fixing this, happy to try.

  1. is this an appropriate place to fix? c.gets(raw, off) -> c.gets(raw, off).decode('ascii')? Happy to dig into this further and learn how things work, but don't want to duplicate effort.
  2. do you have loader tests? are you open to adding some PE files, from e.g. ReactOS, to demonstrate issues and fixes?

williballenthin avatar Jan 10 '20 17:01 williballenthin

The accessors in miasm.jitter.loader.pe return decoded strings. So, maybe its reasonable that miasm.loader.pe deals with raw data (bytes) and that higher level interfaces provide more Pythonic data types.

If so, you may close this issue. I can add documentation to miasm.loader.pe explaining this, if you'd like.

williballenthin avatar Jan 10 '20 18:01 williballenthin

Your findings on the code are good (for the c.gets place) I have to double check but I think the exports/imports structures in windows are "ascii" (I mean not utf8/utf16/codepage encoded). So as there may not be special characters, we may simply transforms bytes to Python str (we already assume this in the imported functions names if I remember correctly) So we may fix the loader.pe as you are proposing to expose native str python for the user (and encode decode them as bytes for the PE) But I am afraid the decoding way you propose may fail:

>>> a = b"\xc3"
>>> a.decode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

But, as we know the raw bytes are not encoded, we may extract the native python str with:

>>> chr(0xc3)
'Ã'
>>> a = chr(0xc3)
>>> ord(a)
195
>>> hex(ord(a))
'0xc3'
>>> 

(Don't hesitate if you have a counter example here)

And maybe one day a malware will have a strange import or export directory.

So some weeks ago, we have implemented this: https://github.com/cea-sec/miasm/blob/master/miasm/core/utils.py#L90

def force_str(value):
    if isinstance(value, str):
        return value
    elif isinstance(value, bytes):
        out = ""
        for i in range(len(value)):
            # For Python2/Python3 compatibility
            c = ord(value[i:i+1])
            out += chr(c)
        value = out
    else:
        raise ValueError("Unsupported type")
    return value

If you are ok with this, you can use this to fix the c.gets By the way, the opposite function (string to bytes without decoding) doesn't exist...

For the regression tests, some times ago we had malwares amoung the regression tests and we have been "blacklisted" by travis... So we have decided this: for "simple and light" tests, we include them in the main repository. For "heavier" tests (typically big PE or ELF files), we may place them in another repository https://github.com/cea-sec/miasm-extended-tests which are run during travis callback. This way, we may limit the "main repository" size.

Don't hesitate to tell me if I am not clear here!

serpilliere avatar Jan 11 '20 15:01 serpilliere