yara-x icon indicating copy to clipboard operation
yara-x copied to clipboard

Expose modules and rule dependencies after compilation

Open wxsBSD opened this issue 4 months ago • 5 comments

We have a large set of rules and want to export only the necessary subset of them for a given rule, but we currently have no easy way of determining which modules or rules dependencies need to be exported with it. For example, given these rules:

import "pe"

// ... LOTS of other rules with various modules used and rule dependencies.

rule a {  condition: pe.is_dll() }
rule b { strings: $x = "foobar" condition: a and $x }

If we want to export rule b from our collection we would need to also export a and make sure the pe module is imported in the resulting file. AFAIK there is currently no good way, after the rules are compiled, to know which modules are required for a given rule or what other rules are dependencies of the given rule.

Is this something you think is a good idea to do, and exposing it in the various APIs? It makes rule management and sharing much easier for us as we can generate just the smallest set of rules and modules used for any given rule we want to export to other systems. I imagine others may have a similar need for better rule management and exporting systems.

wxsBSD avatar Oct 24 '25 17:10 wxsBSD

I was about to suggest you to use the parser for that. In most cases, once you have the AST, finding the dependencies for a rule should be easy, it's a just a matter of traversing the rule's condition looking for identifiers that match a module name or rule name. The hardest part is writting the logic that traverses the AST, it would consists on a huge match statement with all possible expression types, but it makes sense to have a mechanism that allows you to iterate an expression in the AST in DFS order, similar to the DFSIter type implemented for the IR tree:

https://github.com/VirusTotal/yara-x/blob/21e713ee0bf2698768a1bab883f8468be5bf23c8/lib/src/compiler/ir/dfs.rs#L33-L63

I think that having a DFSIter for AST expressions would make sense. However... going back to the original problem.. things get complicated when you have edge cases like this:

  import "pe"
  import "hash"

  rule test {
        condition:
            with pe = hash : (
                pe.sha256(0,filesize) == "123456..."
            )
  }

That's a perfectly valid rule, but a naive solution that traverses the AST looking for identifiers will fall in the trap and think that the pe module is actually used, but it is not.

If you can live with having more dependencies than necessary in a case like this, I think this is the way I would go.

Having a DFSIter for AST expressions could also allow to rewrite this function in a non-recursive manner: https://github.com/VirusTotal/yara-x/blob/21e713ee0bf2698768a1bab883f8468be5bf23c8/parser/src/ast/ascii_tree.rs#L57

plusvic avatar Oct 24 '25 19:10 plusvic

If you want to get that information from non-Rust code, like in the C, Python or Golang API it gets more complicated.

plusvic avatar Oct 24 '25 19:10 plusvic

In be3743c I've implemented a Depht-First-Search iterator for rule conditions as mentioned above.

plusvic avatar Nov 03 '25 10:11 plusvic

Thanks! I was going to put my branch up this week and ask for help with getting the closures for anchors and quantifies to work, but now I get to review yours instead!

I will rebase my work that is using this onto yours and hopefully share it soon. It's basically "yr debug deps" and it outputs the deps for all rules as graphviz file or text.

wxsBSD avatar Nov 03 '25 11:11 wxsBSD

EDIT: I asked some questions, and upon closer inspection I see how things are handled now.

wxsBSD avatar Nov 03 '25 18:11 wxsBSD

I implemented a "deps" command which uses this in https://github.com/VirusTotal/yara-x/pull/498

wxsBSD avatar Nov 14 '25 20:11 wxsBSD