yara wide strings match only UTF16-LE

Describe the bug The example string "Borland" from https://yara.readthedocs.io/en/v4.2.3/writingrules.html#wide-character-strings is there encoded as B\x00o\x00r\x00l\x00a\x00n\x00d\x00 but that's just the LE version of UTF16 with BE being\x00B\x00o\x00r\x00l\x00a\x00n\x00d (\x00 in front). So the example rule from the docs doesn't match UTF16-BE:

rule WideCharTextExample1
{
    strings:
        $wide_string = "Borland" wide

    condition:
        $wide_string
}

UTF16-LE is by far the most common case but I stumbled upon the string Qi Lijun in UTF16-BE in 2fb7a38e69a88e3da8fece4c6a1a81842c1be6ae9d6ac299afa4aef4eb55fd4b (however that happened ...)

(Actually this is more unexpected behavior than a bug but that fits better than a feature request.)

To Reproduce

rule WideCharTextExample1
{
    strings:
        $wide_string = "Qi Lijun" wide

    condition:
        $wide_string
}

Doesn't match:

$ yara test.yar 2fb7a38e69a88e3da8fece4c6a1a81842c1be6ae9d6ac299afa4aef4eb55fd4b

Expected behavior There would be several options to handle the problem:

Back to the Borland example, the perfect solution would be to search for both UTF16-LE and UTF16-BE. UTF16-LE: B\x00o\x00r\x00l\x00a\x00n\x00d\x00 UTF16-BE: \x00B\x00o\x00r\x00l\x00a\x00n\x00d
The faster and memory saving would be to strip the \x00 in the end of the existing implementation and search for: B\x00o\x00r\x00l\x00a\x00n\x00d

That might hit wrong on very short strings (which shouldn't happen that often because of the performance and false positive problems).

Introduce e.g. widebe as a new string modifier, similar to uint16be.
Explain the issue in the docs and recommend to use hex for UTF16-BE.

Please complete the following information:

OS: Linux
YARA version: 4.3.0

Additional context This also affects string search on VT. This search doesn't show any results: content:"Qi Lijun" tag:peexe This shows 10 hits: content:{00 51 00 69 00 20 00 4c 00 69 00 6a 00 75 00 6e} (same string in hex(UTF16-BE) )

Mar 06 '23 14:03 ruppde

More precise: This isn't a problem if the string to be matched is in the middle of UTF16 file, because there a null bytes all around. It's only a problem to match at the transition between a multi byte to a single byte charset (like in the example above) or at the end of the file.

For example this string

$endtag = "%>" ascii wide

... wouldn't match on the UTF-16BE-encoded webshell below because it only searches for 25 00 3e 00 and 25 3e.

$ hexdump UTF-16BE.jsp 
00000000  3c 25 40 20 70 61 67 65  20 63 6f 6e 74 65 6e 74  |<%@ page content|
00000010  54 79 70 65 3d 22 63 68  61 72 73 65 74 3d 55 54  |Type="charset=UT|
00000020  46 2d 31 36 42 45 22 20  25 3e 00 3c 00 25 00 52  |F-16BE" %>.<.%.R|
00000030  00 75 00 6e 00 74 00 69  00 6d 00 65 00 2e 00 67  |.u.n.t.i.m.e...g|
00000040  00 65 00 74 00 52 00 75  00 6e 00 74 00 69 00 6d  |.e.t.R.u.n.t.i.m|
00000050  00 65 00 28 00 29 00 2e  00 65 00 78 00 65 00 63  |.e.(.)...e.x.e.c|
00000060  00 28 00 72 00 65 00 71  00 75 00 65 00 73 00 74  |.(.r.e.q.u.e.s.t|
00000070  00 2e 00 67 00 65 00 74  00 50 00 61 00 72 00 61  |...g.e.t.P.a.r.a|
00000080  00 6d 00 65 00 74 00 65  00 72 00 28 00 22 00 69  |.m.e.t.e.r.(.".i|
00000090  00 22 00 29 00 29 00 3b  00 25 00 3e              |.".).).;.%.>|

So the problem is rather low prio.

Mar 13 '23 22:03 ruppde

I'd like to +1 this issue.

The situation you're running into here is the programName value within the SpcSpOpusInfo details .

We have blogged about this in "I Solemnly Swear My Driver Is Up to No Good". Right now we have to UTF-16BE + Hex encode the string before adding it to yara rules. It would be very helpful for ease of reading the rule and also in rule creation to add a utf16be modifier.

Apr 26 '24 18:04 jaredscottwilson

yara yara copied to clipboard

wide strings match only UTF16-LE

yara
yara copied to clipboard