uap-cpp
uap-cpp copied to clipboard
consider switching to `FilteredRE2`
RE2 offers a couple of lesser known features for matching multiple regular expressions. Given that internal/README.md describes a "snippet index", which sounds remarkably like FilteredRE2, you might want to consider switching to FilteredRE2 and deleting the "snippet index" code.
I would also recommend taking a gander at this suggestion: while I don't have a direct comparison the port of FilteredRE2 in ua-parser/uap-rust is about on par with FilteredRE2 (with just an re2::RE2::Set prefilter), and uap-cpp is about 2.5x slower than uap-rust from what I've seen so far.
While I believe re2::RE2 can be quite slow at extraction[^1] I would assume most of the difference is in actually finding out the correct regex since rust's own regex is also way slower when capturing than when just matching (by a factor of 2-3x iirc).
[^1]: I actually experimented with extracting using re2 in ua-parser/uap-python and it turned out to be slower than using the built-in re package