yara Scanning a large process on Windows uses large amounts of memory

On Windows, scanning a process takes an amount of memory that is (roughly) proportional to the amount of memory the process uses. E.g., if a process has allocated 1GB on its heap, a YARA scan of this process will take another GB. In contrast to this, during file scanning, YARA's memory usage is independent of the scanned file's size.

From my understanding of the YARA implementation, this is because YARA iterates over all regions in the target process (using VirtualQueryEx to query the region information) and each region is used as a singular memory block. The implementation details for this can be found in yr_process_get_next_memory_block. The highest amount of memory used by this approach is the largest region in the target process.

As a possible solution, large regions could be split into multiple memory blocks with a maximum block size. This does open up the question, though, what a good "maximal block size" would be for this purpose. Also, I'm not familiar enough with the scanner's internal architecture to know whether this has any unintended side effects.

Nov 03 '20 07:11 secDre4mer

I've implemented the suggestion here: https://github.com/secDre4mer/yara/commit/e6464c5d4db4b483a05651f0e2f2b39fdc5b41c4 It's limited to Windows so far, and only tested rudimentarily. Any suggestions for improvement are welcome.

Nov 13 '20 13:11 secDre4mer

My only concern with this is that it can lead to false negatives if the pattern you are looking for with you rule spans a block boundary. The risk is small if you use a large enough block size, and that's the price to pay for not using too much memory, but we must document well how it works. Also, putting a minimum acceptable block size that is large enough would help. I wouldn't got with blocks smaller than 50MB or so.

Nov 19 '20 18:11 plusvic

Yes, agreed. I noticed that as well in my tests, but as you said, the risk becomes sufficiently small with large block sizes. The PR sets the default value to 1 GB, and users with less resources can then reduce it (with the resulting higher risk of false negatives).

Nov 20 '20 07:11 secDre4mer

Do you plan to implement it for the remaining platforms?

Nov 20 '20 08:11 plusvic

The pull request already includes commits that add the changes for the remaining platforms. I haven't tested FreeBSD or OpenBSD yet (I'm currently setting up a FreeBSD VM for the former), the others worked in my tests.

Nov 20 '20 08:11 secDre4mer

I've tested FreeBSD and OpenBSD now as well (and fixed a bug from my changes for both).

Nov 20 '20 12:11 secDre4mer

yara yara copied to clipboard

Scanning a large process on Windows uses large amounts of memory

yara
yara copied to clipboard