yara
yara copied to clipboard
Matching some expressions out of an expression array
Usage: X of [ true, false, true ]
It does what you would expect, if enough array items in array are evaluated to true, then whole statement is true. X can be a number/'any'/'all' just like in for loops. The array evaluation structure is reusing loop indexing for memory organization (saving internal variables) and the algorithm is pretty similar too, so it should cause no confusion.
It supports short circuit evaluation, which means that if enough array items are evaluated to true, the evaluation of array is cut and remaining items evaluation is skipped.
Why we need this:
Analysts frequently use rules such as
for 2 i in (1..4): ( (i == 1 and cuckoo.filesystem.file_write(/.../i)) or (i == 2 and cuckoo.filesystem.file_write(/.../i)) or (i == 3 and cuckoo.filesystem.file_write(/../i)) or (i == 4 and cuckoo.network.http_request(/.../i)) )
The aim is to simplify such rules to something that can be read and written with more ease:
2 of [ cuckoo.filesystem.file_write(/.../i), cuckoo.filesystem.file_write(/.../i), cuckoo.filesystem.file_write(/../i), cuckoo.network.http_request(/.../i) ]
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
:memo: Please visit https://cla.developers.google.com/ to sign.
Once you've signed (or fixed any issues), please reply here with @googlebot I signed it!
and we'll verify it.
What to do if you already signed the CLA
Individual signers
- It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.
Corporate signers
- Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
- The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
- The email used to register you as an authorized contributor must also be attached to your GitHub account.
ℹ️ Googlers: Go here for more info.
@googlebot I signed it!
I like the idea of implementing this feature, but I'm not sure this is the more appropriate way to do it. Instead of generating VM code specifically for expressions of the form <for_expression> of [<boolean_array_expression>]
we should try to do something more general, like <for_expression> of <iterator>
where iterator can be an array of expressions or some other array of booleans (for example it could be an array returned by some module).
I'm going to give a second thought to this feature. I also have in mind implementing <expression> in <iteraror>
, which is somehow related.
if my understanding is correct, then in and of is going to be pretty much same piece of code, the only difference is going to be that 'of' is going to search for X matches of bool@true expressions and shortcircuit after finding that many matches, while 'in' is going to search for 1 expression of provided type@value and shortcircuit after that.
that means that if i make bool_arrays accept any expression and then i will check for the type and value that will be stored in the memory, then i will effectively create the iterator you were referring to.
is my assumption correct? is there any other reason you want to generalize bool_arrays into an iterator or is it only because of reusing the same code? is there a use-case for supporting different types than booleans in arrays for the of operator? if i'll be doing these changes, then i might as well add the in operator if you want to, as there isn't going to be much of a difference between the two constructs
@plusvic do you think i should pursue the changes i described or do i put this whole thing on hold?
Put it on hold, I think this should be part of a larger more ambitious change that I have in mind.
@plusvic Do you think you could possibly share some details about this ambitious change with us? :) @tomaskender is working in my team on some improvements to YARA itself and we'd like to start using them internally while we also want to share them with upstream. Having an insight into what the plans are with YARA would help us a lot with steering our design decision in the future.
I don't have the full picture yet, but the plan is generalizing your proposal to something that could accept expressions like...
2 of some_module.some_array
...where some_array
is an array of booleans. So the feature you propose would be actually something like:
<for_expression> of <iterator>
Also, I want to implemnt an in
operator like <expression> in <iteraror>
which would return true if the value of <expression>
is contained in the <iterator>
would be useful.
In all these cases <iterator>
should anything that can be iterable, including a list of expressions like:
[
cuckoo.filesystem.file_write(/.../i),
cuckoo.filesystem.file_write(/.../i),
cuckoo.filesystem.file_write(/../i),
cuckoo.network.http_request(/.../i)
]
This for example would be perfectly valid:
true in [
cuckoo.filesystem.file_write(/.../i),
cuckoo.filesystem.file_write(/.../i),
cuckoo.filesystem.file_write(/../i),
cuckoo.network.http_request(/.../i)
]
The overall idea is making all this construct orthogonal, in the sense that you have simple pieces that you can combine in a flexible way. That may require some large refactoring of the existing code.