yara icon indicating copy to clipboard operation
yara copied to clipboard

using yara regex rule to scan chinese character, error

Open hanggao481 opened this issue 1 year ago • 3 comments

How to use yara regex rule to scan chinese character? what's the reason of the following error match?

Describe the bug my yara rule: rule AsianCharacter : general { strings: $chinese = /[\u8fd9]/ condition: $chinese }

match result: 0x1cd:$chinese: u 0x1d2:$chinese: f 0x1dd:$chinese: 8

Expected behavior expecting match result: 0x1cd:$chinese: 这

Note: unicode of "这" is \u8fd9

hanggao481 avatar Aug 17 '23 10:08 hanggao481

another example: I want to scan Chinese character by regex yara rules as beloww: rule AsianCharacter : general { strings: $chinese = /[\u4e00-\u9fa5]/ condition: $chinese } Problem: it cannot match Chinese character.

hanggao481 avatar Aug 18 '23 02:08 hanggao481

Yara does not have unicode handling in strings, and the \u syntax does not exist. What you wrote is actually [u8fd9], so one of those five ascii bytes.

If you want to search for a non ascii character, you will need to search for the bytes that match its encoding in the files you search. For utf-8 files, that would mean something like this:

rule AsianCharacter : general
{
  strings:
    $chinese = /\xe8\xbf\x99/
  condition:
    $chinese
}

For utf-16 encoding, I guess something like that /\x8f\xd9/.

Note that because you need to encode in a given encoding, you cannot use ranges like in your second example.

vthib avatar Aug 20 '23 10:08 vthib

Yara does not have unicode handling in strings, and the \u syntax does not exist. What you wrote is actually [u8fd9], so one of those five ascii bytes.

If you want to search for a non ascii character, you will need to search for the bytes that match its encoding in the files you search. For utf-8 files, that would mean something like this:

rule AsianCharacter : general
{
  strings:
    $chinese = /\xe8\xbf\x99/
  condition:
    $chinese
}

For utf-16 encoding, I guess something like that /\x8f\xd9/.

Note that because you need to encode in a given encoding, you cannot use ranges like in your second example.

Thanks. Is there any way to use yara to match Chinese characters ? It means that a scope of unicode can be a yara regex like general regex, e.g. [\u4e00-\u9fa5].

gaohang avatar Sep 04 '23 01:09 gaohang