miasm
miasm copied to clipboard
PE: import/export names are not decoded to strings
As discussed offline, miasm.loader.pe
does not decode bytes into strings for import and export entries, such as pe.executable.DirImport.impdesc[0].dlldescname.name
.
If you'd like me to take a stab at fixing this, happy to try.
- is this an appropriate place to fix?
c.gets(raw, off)
->c.gets(raw, off).decode('ascii')
? Happy to dig into this further and learn how things work, but don't want to duplicate effort. - do you have loader tests? are you open to adding some PE files, from e.g. ReactOS, to demonstrate issues and fixes?
The accessors in miasm.jitter.loader.pe
return decoded strings. So, maybe its reasonable that miasm.loader.pe
deals with raw data (bytes) and that higher level interfaces provide more Pythonic data types.
If so, you may close this issue. I can add documentation to miasm.loader.pe
explaining this, if you'd like.
Your findings on the code are good (for the c.gets
place)
I have to double check but I think the exports/imports structures in windows are "ascii" (I mean not utf8/utf16/codepage encoded). So as there may not be special characters, we may simply transforms bytes to Python str (we already assume this in the imported functions names if I remember correctly)
So we may fix the loader.pe as you are proposing to expose native str
python for the user (and encode decode them as bytes for the PE)
But I am afraid the decoding way you propose may fail:
>>> a = b"\xc3"
>>> a.decode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
But, as we know the raw bytes are not encoded, we may extract the native python str with:
>>> chr(0xc3)
'Ã'
>>> a = chr(0xc3)
>>> ord(a)
195
>>> hex(ord(a))
'0xc3'
>>>
(Don't hesitate if you have a counter example here)
And maybe one day a malware will have a strange import or export directory.
So some weeks ago, we have implemented this: https://github.com/cea-sec/miasm/blob/master/miasm/core/utils.py#L90
def force_str(value):
if isinstance(value, str):
return value
elif isinstance(value, bytes):
out = ""
for i in range(len(value)):
# For Python2/Python3 compatibility
c = ord(value[i:i+1])
out += chr(c)
value = out
else:
raise ValueError("Unsupported type")
return value
If you are ok with this, you can use this to fix the c.gets
By the way, the opposite function (string to bytes without decoding) doesn't exist...
For the regression tests, some times ago we had malwares amoung the regression tests and we have been "blacklisted" by travis... So we have decided this: for "simple and light" tests, we include them in the main repository. For "heavier" tests (typically big PE or ELF files), we may place them in another repository https://github.com/cea-sec/miasm-extended-tests which are run during travis callback. This way, we may limit the "main repository" size.
Don't hesitate to tell me if I am not clear here!