mf2py icon indicating copy to clipboard operation
mf2py copied to clipboard

significant performance regression since 1.0.4

Open snarfed opened this issue 7 years ago • 5 comments

something changed since 1.0.4 that hurt performance pretty significantly, at least for granary and bridgy. they both saw a slowdown of at least 30-50% (probably more) when they upgraded to 1.1.1. we should investigate, figure out what changed, and fix it!

mf2py_104_to_head_granary mf2py_104_to_head_3

(these latency measurements are for bridgy's and granary's overall HTTP requests, which do lots more than mf2py parsing, including network requests, so mf2py's slowdown is much bigger than just the delta you see here with the upgrade on 7/24.)

probably not targeting the 1.1.2 release, but maybe the one after. cc @cleverdevil @sknebel @kartikprabhu.

snarfed avatar Aug 02 '18 21:08 snarfed

@snarfed additional datapoint: Did you update BS4 in production around #110, or at the same time as the update above? If you did it earlier, I don't have to include multiple versions of it in profiling.

sknebel avatar Aug 03 '18 15:08 sknebel

After first simple measurements I see ~50% slowdown for just microformats processing too.

FWIW, here's the diff between 1.0.4 and master: https://github.com/microformats/mf2py/compare/af3f1f50d5b3081cd53e532ad5ee5eaf99bc1956...master and the tree for it https://github.com/microformats/mf2py/tree/af3f1f50d5b3081cd53e532ad5ee5eaf99bc1956 (since it didn't get tagged in the repo it seems)

sknebel avatar Aug 03 '18 17:08 sknebel

@sknebel thanks for looking at this! i think i updated BS4 before 7am pst 7/24, and mf2py to head around 3:30pm pst 7/24, so not at the same time. not 100% sure though.

snarfed avatar Aug 04 '18 19:08 snarfed

First PR: #123

I'd be interested in a somewhat representative sample set for pages to parse: e.g. right now I'd like to know in what ratio encountered urls are relative vs absolute. I maybe could grab 100-1000 random entries from indiemap or brid.gy user pages, other ideas?

sknebel avatar Aug 05 '18 20:08 sknebel

hey, great! sure, that's easy to answer with indiemap, just query the links table, https://indiemap.org/docs.html#schema-links :

SELECT (STARTS_WITH(to_url, 'http://') OR STARTS_WITH(to_url, 'https://') OR STARTS_WITH(to_url, '//')) AS absolute, COUNT(*)
FROM indiemap.links
GROUP BY absolute

results:

Row absolute f0_
1 true 264548084
2 false 158375878

so ~37% (158M/423M) of links in indiemap are relative.

snarfed avatar Aug 06 '18 06:08 snarfed

@sknebel's work here on #123 was great, and largely fixed this for me. Thank you! Tentatively closing.

snarfed avatar Jan 18 '23 20:01 snarfed