yara icon indicating copy to clipboard operation
yara copied to clipboard

Yara cannot scan directories and files with non-ANSI characters in the name

Open canofindit opened this issue 5 years ago • 1 comments

Yara code uses char (8 bit) character type and string functions as well as ANSI api's in Windows to list directories and open files. As a result an "error scanning .. cannot open file" message is printed whenever a file is encountered where the file name contains 1 or more non-ANSI characters. Vice versa for directories. A different code page cannot solve this problem either. The result is that a single non-ANSI character can make a file or entire directory unavailable for scanning. Since there are many non-ANSI characters that print (almost) identically to existing ANSI characters, it is easy to hide the the reason why a file or directory could not be scanned. It is a serious security risk. Malware only needs to use a single non-ANSI character in a file name to bypass scanning. Moreover that character could print identically to an existing ANSI character, hiding effectively the reason of the error.

canofindit avatar Oct 06 '20 19:10 canofindit

Windows version only. Here is an illustration of the issue. I have a simple directory called Malware. There is a yara script file, malware.yar that contains:

rule no_error_scanning { strings: $a = "malware" condition: $a }

The directory structure is:

_Directory of C:\Users\Malware

08-10-20 17:34 <DIR> . 08-10-20 17:34 <DIR> .. 08-10-20 17:19 <DIR> cаnnot_scan_this_dir 08-10-20 17:46 15 malware.exe 08-10-20 17:16 76 malware.yar 08-10-20 17:01 15 mаlware.exe 08-10-20 17:18 <DIR> this_dir_is_scanned 3 File(s) 106 bytes

Directory of C:\Users\Malware\cаnnot_scan_this_dir

08-10-20 17:19 <DIR> . 08-10-20 17:19 <DIR> .. 08-10-20 17:46 15 other_malware.exe 1 File(s) 15 bytes

Directory of C:\Users\Malware\this_dir_is_scanned

08-10-20 17:18 <DIR> . 08-10-20 17:18 <DIR> .. 08-10-20 17:46 15 some_malware.exe 1 File(s) 15 bytes_

There is a test file, called malware.exe that is just a basic text file with the following contents:

This is malware

The time stamp is 17:46. There is also a another file that appears to have the same name but with time stamp 17:01. The contents is identical, but the name contains a cyrillic letter 'a' (unicode 0x0430) instead of a latin alfabet 'a'.

There is a directory "this_dir_is_scanned" that contains an identical copy of the malware.exe file with timestamp 17:46. The same for another directory "cаnnot_scan_this_dir". However, that directory name again contains a cyrillic letter 'a'.

Here is the result of the scan, yara version 4.02:

yara -r malware.yar . error scanning .\m?lware.exe: could not open file no_error_scanning .\malware.exe no_error_scanning .\malware.yar no_error_scanning .\this_dir_is_scanned\some_malware.exe

The directory cаnnot_scan_this_dir is not scanned and is also not reported as not scanned. The file open error concerns the file with the unicode letter 'a'.

canofindit avatar Oct 08 '20 15:10 canofindit