significant performance regression since 1.0.4
something changed since 1.0.4 that hurt performance pretty significantly, at least for granary and bridgy. they both saw a slowdown of at least 30-50% (probably more) when they upgraded to 1.1.1. we should investigate, figure out what changed, and fix it!
(these latency measurements are for bridgy's and granary's overall HTTP requests, which do lots more than mf2py parsing, including network requests, so mf2py's slowdown is much bigger than just the delta you see here with the upgrade on 7/24.)
probably not targeting the 1.1.2 release, but maybe the one after. cc @cleverdevil @sknebel @kartikprabhu.
@snarfed additional datapoint: Did you update BS4 in production around #110, or at the same time as the update above? If you did it earlier, I don't have to include multiple versions of it in profiling.
After first simple measurements I see ~50% slowdown for just microformats processing too.
FWIW, here's the diff between 1.0.4 and master: https://github.com/microformats/mf2py/compare/af3f1f50d5b3081cd53e532ad5ee5eaf99bc1956...master and the tree for it https://github.com/microformats/mf2py/tree/af3f1f50d5b3081cd53e532ad5ee5eaf99bc1956 (since it didn't get tagged in the repo it seems)
@sknebel thanks for looking at this! i think i updated BS4 before 7am pst 7/24, and mf2py to head around 3:30pm pst 7/24, so not at the same time. not 100% sure though.
First PR: #123
I'd be interested in a somewhat representative sample set for pages to parse: e.g. right now I'd like to know in what ratio encountered urls are relative vs absolute. I maybe could grab 100-1000 random entries from indiemap or brid.gy user pages, other ideas?
hey, great! sure, that's easy to answer with indiemap, just query the links table, https://indiemap.org/docs.html#schema-links :
SELECT (STARTS_WITH(to_url, 'http://') OR STARTS_WITH(to_url, 'https://') OR STARTS_WITH(to_url, '//')) AS absolute, COUNT(*)
FROM indiemap.links
GROUP BY absolute
results:
| Row | absolute | f0_ |
|---|---|---|
| 1 | true | 264548084 |
| 2 | false | 158375878 |
so ~37% (158M/423M) of links in indiemap are relative.
@sknebel's work here on #123 was great, and largely fixed this for me. Thank you! Tentatively closing.