yara-python
yara-python copied to clipboard
Modules with non utf8 values in dictionaries can lead to scan aborting exception
trafficstars
When using a modules_callback during a match, the module values are converted to Python. However, the conversion of the "dictionary" type is buggy: it uses PyDict_SetItemString with the dictionary key as the key. However, this function expects a utf-8 string, and the dictionary key is not guaranteed to be utf-8.
This can happen with the pe module and the version_info dictionary: keys come from the version info of the binary and are not guaranteed at all to be utf-8.
For example, by taking the mtxex.dll from yara tests, and simply changing a byte from the version info to 0xFF, I get this result:
import yara
rules = yara.compile(source="""
import "pe"
rule a { condition: true }
""")
def cb(_):
pass
rules.match("mtxex.dll", modules_callback=cb)
gives:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 8: invalid start byte
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SystemError: <function cb at 0x7f21c3722340> returned a result with an exception set
The ideal fix would be to use bytestrings as keys instead of strings, but that would be a breaking change