bulk_extractor
bulk_extractor copied to clipboard
Remove scan_base16_lg
There's no compelling reason to bring scan_base16_lg forward to the 2.0 API:
- Hex scanning is disabled by default.
- scan_base16.flex exists as a fallback, and it's a relatively simple flex-based scanner with just a single pattern (i.e., not confounding behavior due to other patterns).
- The base16 regexp in scan_base16_lg will slow down other scanners and increase NFA size with determinization (
[0-9a-fA-F]{6,}is likely to cause splits with other states and make it less likely to filter out impossible prefixes based on two-byte ngram filter). - There's not much for encoding concerns.
scan_base16_lg could be an improvement on scan_base16.flex performance-wise, but I feel it's unlikely to be a big win. It could also be that scan_base16_lg has worse performance.
With your approval, @simsong, I will delete it.
I concur.