bulk_extractor icon indicating copy to clipboard operation
bulk_extractor copied to clipboard

add yara to bulk extractor

Open simsong opened this issue 1 year ago • 3 comments

c.f. #483

simsong avatar Jan 08 '25 14:01 simsong

yara-x (https://github.com/VirusTotal/yara-x) is a rewrite of yara in Rust, by the yara team at Google. One of their build products is a library with a C API: https://virustotal.github.io/yara-x/docs/api/c/c-/. That includes a package config file, so it should be easy enough to test for its presence with autoconf.

jonstewart avatar Mar 20 '25 01:03 jonstewart

That's great. I've wanted to add Yara, but had decided against it because normal yara uses PCRE and that has the regular expression problem we had previously noted. What does yara-x do for REs?

simsong avatar Mar 20 '25 02:03 simsong

Both yara and yara-x have their own regex implementations. Momma always told me if you can't say anything nice, best not say anything at all.

. . .

. . .

yara-x is significantly faster than yara, but it's not doing anything too fancy. It really doesn't matter, though: yara is the de facto tool for writing file-based malware detection rules. See https://github.com/100DaysofYARA or https://github.com/Neo23x0/signature-base/tree/master/yara.

So, yes, people can and will write awful regexes in their yara rules, but there are many expert threat detection/malware folks out there who've learned how to write good rules, and those rule bases are readily available. Many—most?—yara rules do not use full-on regex patterns but specify fixed-strings instead, which all get bundled up into an Aho-Corasick DFA (which, again, if you're in the regex mafia, you'd know this can be counterproductive because your automaton blows up in size and you spend all day waiting on your memory bus, but it does provide some O(n) assurance, of sorts).

As a command-line tool, yara executes against a file. The virtue of using yara inside of bulk_extractor would be combining bulk_extractor's power at coping with raw dirty data with all the existing rulebases for detecting malware.

jonstewart avatar Mar 20 '25 02:03 jonstewart