yara-python icon indicating copy to clipboard operation
yara-python copied to clipboard

Unicode filepath support

Open jbremer opened this issue 7 years ago • 2 comments

I'd like to request the ability for yara.compile() as well as rules.match() to support Unicode filepaths. For now a dirty workaround would seem to be creating a temporary symbolic link (containing only ASCII characters) that points to the Unicode filepath. But that naturally doesn't work under Windows, so no luck there unfortunately. AFAIK changing s to u in the Python argument parsing and fopen to wfopen (or something along those lines) will go a long way. Thanks!

Example situation where support for this would help @ https://github.com/cuckoosandbox/cuckoo/issues/1573

jbremer avatar Jun 23 '17 14:06 jbremer

I did some testing (on Linux) in regard to this, and made the following observations:

  • this problem only affects Python 2, but does not affect Python 3 at all (regardless of the yara-python version)
  • it works in Python 2 as well if you encode the Unicode string as UTF-8 and pass the UTF-8 bytes to the yara.compile() function.

Testing script:

#!/usr/bin/env python2
import yara

filename = '\xe6\x97\xa9\xe4\xb8\x8a\xe5\xa5\xbd'.decode('utf-8') + '.yar'

data = '''
rule AlwaysTrue {
    condition:
        true
}

rule AlwaysFalse {
    condition:
        false
}
'''.lstrip().encode('utf-8')

with open(filename, 'wb') as f:
    f.write(data)

print('Filename: %s' % filename)
rules = yara.compile(filename.encode('utf-8'))

for match in rules.match(__file__):
    print('Rule %r matched' % match.rule)

Output:

$ ./test.py
Filename: 早上好.yar
Rule 'AlwaysTrue' matched

snemes avatar Mar 05 '21 10:03 snemes

P.S: Still doesn't work on Windows though (neither with Python 2, nor Python 3)

snemes avatar Mar 05 '21 11:03 snemes