RFC: Composable Modifiers
I've been thinking of ways to make modifiers better. I wrote up some notes on how I think this can be done and would love some discussion (especially if @plusvic thinks this is a terrible idea or not =b). My notes are at https://gist.github.com/wxsBSD/44aa8b8133e3ea96e738b66ec1c600f2 and if this sounds like something reasonable I'm willing to put in the effort to get it working.
The design looks nice, and I would certainly use that idea from the very beginning if I had envisioned that modifiers could get more complex over time. However, I'm worried about two issues:
-
Backward compatibility. There are literally hundred of thousands YARA rules out there, and a large percentage of them use string modifiers. Rewriting all those rules manually is not an option, for a change like this we would be forced to provide additional tooling for automating the rewrite. From time to time we can afford to introduce some backward incompatible change, like adding a new keyword. The introduction of the "base64" keyword for example broke only a handful of rules in VirusTotal Intelligence, that can be assumed because you can fixed by hand in a few minutes. But the breakage in this case would be a lot larger.
-
Added complexity vs added value. This feature would be certainly helpful if the plan is adding even more modifiers in the future. But right now I'm leaning towards the opposite: being extremely cautious about adding new modifiers. That's why I haven't decided anything yet about the "rol" modifier, I'm putting it on quarantine (a popular word these days) until I have a clear strategy. So far, my impression is that the "rol" modifier is a very niche feature. I had doubts about the "base64" modifier too, but I must admit that I've seen a bunch of rules where people use base64-encoded strings and that helped to change my mind. Also, this kind of modifiers that take one string and produce a single string are a nice-to-have but not strictly necessary, as you can always do the operation by yourself and use the resulting string.
PD: I just saw this other PR https://github.com/VirusTotal/yara/pull/1248 and it becomes clear to me that we either implement this proposal and add more modifiers, or stop adding modifiers. For the time being I prefer the latter.
Backward compatibility
This is a big concern for me too. I think finding a way to maintain existing behavior is a hard requirement. I have ideas on how I can do this but I haven't fully thought them through yet. I'm going to think about how I might do this in a backwards compatible way, and if I can't then I'll drop the idea.
Added complexity vs added value
I too question the value of some of these modifiers (rol and add specifically). I think we should continue to be cautious about adding them for now, but if we can get composable modifiers working we can require that they be used only in a composable way (if multiple modifiers exist).
I think that having clearer string modifiers is useful.
To compose the strings without breaking backwards compatibility, either logic is needed to determine if a composable modifier was used by inspecting the modifiers, or something in the string assignment to indicate what type of modifier is used. For the latter, I don't know of a good alternative to $a = "FOOBAR", and for the former, that is not so simple.