html5gum
html5gum copied to clipboard
Improve performance of DefaultEmitter
While implementing https://github.com/lycheeverse/lychee/pull/480 I realized how slow the default emitter really is. It makes link extraction 10-40% slower than html5ever. It is currently not really possible to beat html5ever at all unless a custom emitter is implemented.
We could:
- build another emitter that reuses strings, and calls a callback with borrowed strings instead. Therefore much closer to lol-html's API.
- allow for custom allocators for all the strings we create -- similar to strtendril magic html5ever does (but definetly not using that crate)
@lebensterben if you're looking to contribute to html5gum I think this would be a good start to get some overlap between lychee contributors and html5gum contributors as it also provides immediate value to lychee and hyperlink (if performance is improved by sufficient margin) Let me know if you are interested in that, also happy to answer any questions.
allow for custom allocators for all the strings we create -- similar to strtendril magic html5ever does (but definetly not using that crate)
Why not? Is strtendril not working?
@Ygg01 while lurking the servo zulip and the public issue tracker, my impression was that author simon sapin would like to rewrite the library or get rid of its usage in html5ever, eg https://github.com/servo/tendril/issues/58
I also think it's fine if users of html5gum get only passable performance out of the box, as long as there's options to tweak things to optimal performance.
Hmm, perhaps it's possible to use zbuf
from html5ever branch? Maybe fork it or something.
I believe long-term, html5gum should implement a treebuilder + dom. so it may be wiser to reevaluate the discussion about string allocation until after then.