Implement bytes keyword.
This commit adds the "bytes" keyword to the language. It can be used to read arbitrary sequences of bytes similar to the int8 family of keywords. The major difference here is you can use it to read arbitrary sequences into strings and compare them.
The implementation is done by dynamically allocating memory for each bytes operation and tracking all the allocations during bytecode execution in a notebook and destroying that notebook when execution is done.
An obvious optimization to make here is to track allocation offset and lengths so that we don't have lots of unnecessary allocations done in loops, like this:
for any s in ("foo", "bar"): (bytes(0, 3) == s)
While the above is an obviously contrived example it does make sense to avoid unnecessary allocations if we already have them.
Discussed in https://github.com/VirusTotal/yara/issues/1780 by @metthal.
Some obvious follow-ups to this that would improve it's usefulness is having the ability to convert a SIZED_STRING to it's hexlified equivalent and vice-versa. I think this belongs in the string module proposed in https://github.com/VirusTotal/yara/pull/1779, and I'm happy to add it if this is deemed a good idea. Also, I'll add docs before merging, just let me know what you think!
One way this is useful is for the times a hash of a blob is embedded in the file and you want to be able to check it. Here is a silly example of how you could check that:
wxs@mbp yara % xxd ~/x
00000000: 4920 414d 2041 2053 5452 494e 477c 6664 I AM A STRING|fd
00000010: 3039 6437 3334 3338 3066 3764 3666 6338 09d734380f7d6fc8
00000020: 3836 3535 6531 3964 3661 3530 3937 6565 8655e19d6a5097ee
00000030: 3064 3166 3262 6565 6665 6161 3539 3335 0d1f2beefeaa5935
00000040: 3863 3035 3062 6334 6435 3836 6565 8c050bc4d586ee
wxs@mbp yara % cat rules/test.yara
import "hash"
rule a {
condition:
hash.sha256(0, 13) == bytes(filesize - 64, 64)
}
wxs@mbp yara % ./yara rules/test.yara ~/x
a /Users/wxs/x
wxs@mbp yara %
Hello ! It would be nifty if it would continue reading from the next block if the range happens to coincide with a block boundary.
Good point! I forgot about that. I've got some planned updates for this and I will include your suggestion too. Thank you for pointing it out!
OK, so other than not working across block boundaries I think I'm mostly happy with this implementation now. I think the limits are very conservative but they can be easily tweaked. In fact, I think it probably makes sense to move them to runtime configuration options available via the API.
I'll dig into the cross block boundaries soon but I think the implementation (and even the idea) is ready for a wider discussion. If it is starting to look like it might be merged I'll do the work to update docs and expose it as some configuration options.
EDIT: Also, if you hit the limits I just silently start returning YR_UNDEFINED. It might make sense to do something more user-friendly here, like raising a runtime warning?
I'm going to close this out as it isn't completely ready (it doesn't work cross blocks) and I want to focus all new efforts on yara-x in the upcoming year.