dissect.target icon indicating copy to clipboard operation
dissect.target copied to clipboard

Add RegexPluginBase to reduce boilerplate in text parsing

Open winingerori opened this issue 5 months ago • 2 comments

Lots of Dissect plugins that do simple regex scanning keep re-copying the same boilerplate—walk files, compile patterns, spit out records, add a “–limit” flag, etc.

Could we add a tiny RegexPluginBase class (pure std-lib, no new deps) that: • compiles each pattern once per subclass, • provides a ready-made scan() that yields a standard regex/match record, • exposes handy flags like --limit (and maybe --pattern), • supports text or binary mode with a SCAN_BINARY toggle?

With it, a new regex plugin drops from ~100 LOC to ~15, tests get simpler, and we avoid N-times recompiling the same regexes.

winingerori avatar Jul 04 '25 08:07 winingerori

Hi @winingerori, I will provide you with an elaborate answer later this week

twiggler avatar Jul 07 '25 10:07 twiggler

Interestingly enough, we are discussing to introduce Parsers as a concept to Dissect. Plugins can then delegate regex scanning (or more complicated parsers) to this Parser. The Parser has no dependencies on Target, so that it can be used as a lego-block in other applications. The precise interface of Parser is still subject to debate, and might be different for text and binary parsers.

While you propose a RegexPluginBase, the list of requirements you provide seem primarily responsibilities of a parser. Moreover, can you elaborate on the structure of a standard regex/match record?

This week I will create a new ticket to add support for Parsers. I expect this functionality will go a long way in streamlining a "regex plugin."

twiggler avatar Jul 14 '25 13:07 twiggler