yara-python icon indicating copy to clipboard operation
yara-python copied to clipboard

Modules with non utf8 values in dictionaries can lead to scan aborting exception

Open vthib opened this issue 7 months ago • 0 comments
trafficstars

When using a modules_callback during a match, the module values are converted to Python. However, the conversion of the "dictionary" type is buggy: it uses PyDict_SetItemString with the dictionary key as the key. However, this function expects a utf-8 string, and the dictionary key is not guaranteed to be utf-8.

This can happen with the pe module and the version_info dictionary: keys come from the version info of the binary and are not guaranteed at all to be utf-8.

For example, by taking the mtxex.dll from yara tests, and simply changing a byte from the version info to 0xFF, I get this result:

import yara
rules = yara.compile(source="""
import "pe"
rule a { condition: true }
""")

def cb(_):
    pass
rules.match("mtxex.dll", modules_callback=cb)

gives:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 8: invalid start byte

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: <function cb at 0x7f21c3722340> returned a result with an exception set

The ideal fix would be to use bytestrings as keys instead of strings, but that would be a breaking change

vthib avatar Mar 25 '25 21:03 vthib