grab-site icon indicating copy to clipboard operation
grab-site copied to clipboard

grab-site spends a lot of time in dupespotter

Open ivan opened this issue 6 years ago • 1 comments

With grab-site 2.x, a crawl of Twitter spends about 25% of its non-idle time in dupespotter, doing various re.subs.

ivan avatar Oct 08 '18 13:10 ivan

Times to run dupespotter's test suite:

as-is with many re.sub: 0.79 seconds combined regexps and a few re.subs: 3 seconds combined regexps and re2 with hand-rolled sub: 6 seconds

Not encouraging as every change makes it slower.

I left the changes in the re2 branch.

ivan avatar Oct 11 '18 11:10 ivan