protego icon indicating copy to clipboard operation
protego copied to clipboard

Use `pyre2` as optional dependency for RegExp speedup.

Open hwlodarczyk-rtbh opened this issue 1 year ago • 0 comments

Just throwing up a far future idea.

I've seen that your lib is 40% slower compared to RobotFileParser from Python versions < 3.13 . I suspect this is because of re module compilation and matching.

pyre2 is a drop-in replacement for re which is faster for simple patterns which are exactly what robots.txt relies on. pyre2 falls back to re if it doesn't support some RegExp features (like lookarounds) but it won't be the case here.

My claims about potential speedup should be tested with your lib of course but nonetheless I think these are worth a consideration.

hwlodarczyk-rtbh avatar Mar 06 '24 18:03 hwlodarczyk-rtbh