Add RegexPluginBase to reduce boilerplate in text parsing
Lots of Dissect plugins that do simple regex scanning keep re-copying the same boilerplate—walk files, compile patterns, spit out records, add a “–limit” flag, etc.
Could we add a tiny RegexPluginBase class (pure std-lib, no new deps) that: • compiles each pattern once per subclass, • provides a ready-made scan() that yields a standard regex/match record, • exposes handy flags like --limit (and maybe --pattern), • supports text or binary mode with a SCAN_BINARY toggle?
With it, a new regex plugin drops from ~100 LOC to ~15, tests get simpler, and we avoid N-times recompiling the same regexes.
Hi @winingerori, I will provide you with an elaborate answer later this week
Interestingly enough, we are discussing to introduce Parsers as a concept to Dissect. Plugins can then delegate regex scanning (or more complicated parsers) to this Parser. The Parser has no dependencies on Target, so that it can be used as a lego-block in other applications. The precise interface of Parser is still subject to debate, and might be different for text and binary parsers.
While you propose a RegexPluginBase, the list of requirements you provide seem primarily responsibilities of a parser. Moreover, can you elaborate on the structure of a standard regex/match record?
This week I will create a new ticket to add support for Parsers. I expect this functionality will go a long way in streamlining a "regex plugin."