yara using yara regex rule to scan chinese character, error

How to use yara regex rule to scan chinese character? what's the reason of the following error match?

Describe the bug my yara rule: rule AsianCharacter : general { strings: $chinese = /[\u8fd9]/ condition: $chinese }

match result: 0x1cd:$chinese: u 0x1d2:$chinese: f 0x1dd:$chinese: 8

Expected behavior expecting match result: 0x1cd:$chinese: 这

Note: unicode of "这" is \u8fd9

Aug 17 '23 10:08 hanggao481

another example: I want to scan Chinese character by regex yara rules as beloww: rule AsianCharacter : general { strings: $chinese = /[\u4e00-\u9fa5]/ condition: $chinese } Problem: it cannot match Chinese character.

Aug 18 '23 02:08 hanggao481

Yara does not have unicode handling in strings, and the \u syntax does not exist. What you wrote is actually [u8fd9], so one of those five ascii bytes.

If you want to search for a non ascii character, you will need to search for the bytes that match its encoding in the files you search. For utf-8 files, that would mean something like this:

rule AsianCharacter : general
{
  strings:
    $chinese = /\xe8\xbf\x99/
  condition:
    $chinese
}

For utf-16 encoding, I guess something like that /\x8f\xd9/.

Note that because you need to encode in a given encoding, you cannot use ranges like in your second example.

Aug 20 '23 10:08 vthib

Yara does not have unicode handling in strings, and the \u syntax does not exist. What you wrote is actually [u8fd9], so one of those five ascii bytes.

If you want to search for a non ascii character, you will need to search for the bytes that match its encoding in the files you search. For utf-8 files, that would mean something like this:
rule AsianCharacter : general
{
  strings:
    $chinese = /\xe8\xbf\x99/
  condition:
    $chinese
}
For utf-16 encoding, I guess something like that /\x8f\xd9/.

Note that because you need to encode in a given encoding, you cannot use ranges like in your second example.

Thanks. Is there any way to use yara to match Chinese characters ? It means that a scope of unicode can be a yara regex like general regex, e.g. [\u4e00-\u9fa5].

Sep 04 '23 01:09 gaohang

yara yara copied to clipboard

using yara regex rule to scan chinese character, error

yara
yara copied to clipboard