yara
yara copied to clipboard
using yara regex rule to scan chinese character, error
How to use yara regex rule to scan chinese character? what's the reason of the following error match?
Describe the bug my yara rule: rule AsianCharacter : general { strings: $chinese = /[\u8fd9]/ condition: $chinese }
match result: 0x1cd:$chinese: u 0x1d2:$chinese: f 0x1dd:$chinese: 8
Expected behavior expecting match result: 0x1cd:$chinese: 这
Note: unicode of "这" is \u8fd9
another example: I want to scan Chinese character by regex yara rules as beloww: rule AsianCharacter : general { strings: $chinese = /[\u4e00-\u9fa5]/ condition: $chinese } Problem: it cannot match Chinese character.
Yara does not have unicode handling in strings, and the \u
syntax does not exist. What you wrote is actually [u8fd9]
, so one of those five ascii bytes.
If you want to search for a non ascii character, you will need to search for the bytes that match its encoding in the files you search. For utf-8 files, that would mean something like this:
rule AsianCharacter : general
{
strings:
$chinese = /\xe8\xbf\x99/
condition:
$chinese
}
For utf-16 encoding, I guess something like that /\x8f\xd9/
.
Note that because you need to encode in a given encoding, you cannot use ranges like in your second example.
Yara does not have unicode handling in strings, and the
\u
syntax does not exist. What you wrote is actually[u8fd9]
, so one of those five ascii bytes.If you want to search for a non ascii character, you will need to search for the bytes that match its encoding in the files you search. For utf-8 files, that would mean something like this:
rule AsianCharacter : general { strings: $chinese = /\xe8\xbf\x99/ condition: $chinese }
For utf-16 encoding, I guess something like that
/\x8f\xd9/
.Note that because you need to encode in a given encoding, you cannot use ranges like in your second example.
Thanks. Is there any way to use yara to match Chinese characters ? It means that a scope of unicode can be a yara regex like general regex, e.g. [\u4e00-\u9fa5].