grab-site
grab-site copied to clipboard
grab-site spends a lot of time in dupespotter
With grab-site 2.x, a crawl of Twitter spends about 25% of its non-idle time in dupespotter, doing various re.sub
s.
Times to run dupespotter's test suite:
as-is with many re.sub
: 0.79 seconds
combined regexps and a few re.sub
s: 3 seconds
combined regexps and re2
with hand-rolled sub
: 6 seconds
Not encouraging as every change makes it slower.
I left the changes in the re2 branch.