uap-cpp icon indicating copy to clipboard operation
uap-cpp copied to clipboard

consider switching to `FilteredRE2`

Open junyer opened this issue 2 years ago • 1 comments

RE2 offers a couple of lesser known features for matching multiple regular expressions. Given that internal/README.md describes a "snippet index", which sounds remarkably like FilteredRE2, you might want to consider switching to FilteredRE2 and deleting the "snippet index" code.

junyer avatar Mar 15 '23 18:03 junyer

I would also recommend taking a gander at this suggestion: while I don't have a direct comparison the port of FilteredRE2 in ua-parser/uap-rust is about on par with FilteredRE2 (with just an re2::RE2::Set prefilter), and uap-cpp is about 2.5x slower than uap-rust from what I've seen so far.

While I believe re2::RE2 can be quite slow at extraction[^1] I would assume most of the difference is in actually finding out the correct regex since rust's own regex is also way slower when capturing than when just matching (by a factor of 2-3x iirc).

[^1]: I actually experimented with extracting using re2 in ua-parser/uap-python and it turned out to be slower than using the built-in re package

masklinn avatar Jun 22 '24 17:06 masklinn