yara-python
yara-python copied to clipboard
Unicode filepath support
I'd like to request the ability for yara.compile()
as well as rules.match()
to support Unicode filepaths. For now a dirty workaround would seem to be creating a temporary symbolic link (containing only ASCII characters) that points to the Unicode filepath. But that naturally doesn't work under Windows, so no luck there unfortunately.
AFAIK changing s
to u
in the Python argument parsing and fopen
to wfopen
(or something along those lines) will go a long way. Thanks!
Example situation where support for this would help @ https://github.com/cuckoosandbox/cuckoo/issues/1573
I did some testing (on Linux) in regard to this, and made the following observations:
- this problem only affects Python 2, but does not affect Python 3 at all (regardless of the yara-python version)
- it works in Python 2 as well if you encode the Unicode string as UTF-8 and pass the UTF-8 bytes to the
yara.compile()
function.
Testing script:
#!/usr/bin/env python2
import yara
filename = '\xe6\x97\xa9\xe4\xb8\x8a\xe5\xa5\xbd'.decode('utf-8') + '.yar'
data = '''
rule AlwaysTrue {
condition:
true
}
rule AlwaysFalse {
condition:
false
}
'''.lstrip().encode('utf-8')
with open(filename, 'wb') as f:
f.write(data)
print('Filename: %s' % filename)
rules = yara.compile(filename.encode('utf-8'))
for match in rules.match(__file__):
print('Rule %r matched' % match.rule)
Output:
$ ./test.py
Filename: 早上好.yar
Rule 'AlwaysTrue' matched
P.S: Still doesn't work on Windows though (neither with Python 2, nor Python 3)