bulk_extractor icon indicating copy to clipboard operation
bulk_extractor copied to clipboard

Remove scan_base16_lg

Open jonstewart opened this issue 4 years ago • 1 comments

There's no compelling reason to bring scan_base16_lg forward to the 2.0 API:

  1. Hex scanning is disabled by default.
  2. scan_base16.flex exists as a fallback, and it's a relatively simple flex-based scanner with just a single pattern (i.e., not confounding behavior due to other patterns).
  3. The base16 regexp in scan_base16_lg will slow down other scanners and increase NFA size with determinization ([0-9a-fA-F]{6,} is likely to cause splits with other states and make it less likely to filter out impossible prefixes based on two-byte ngram filter).
  4. There's not much for encoding concerns.

scan_base16_lg could be an improvement on scan_base16.flex performance-wise, but I feel it's unlikely to be a big win. It could also be that scan_base16_lg has worse performance.

With your approval, @simsong, I will delete it.

jonstewart avatar Sep 12 '21 19:09 jonstewart

I concur.

simsong avatar Sep 12 '21 20:09 simsong