html5gum icon indicating copy to clipboard operation
html5gum copied to clipboard

Improve performance of DefaultEmitter

Open untitaker opened this issue 2 years ago • 5 comments

While implementing https://github.com/lycheeverse/lychee/pull/480 I realized how slow the default emitter really is. It makes link extraction 10-40% slower than html5ever. It is currently not really possible to beat html5ever at all unless a custom emitter is implemented.

We could:

  • build another emitter that reuses strings, and calls a callback with borrowed strings instead. Therefore much closer to lol-html's API.
  • allow for custom allocators for all the strings we create -- similar to strtendril magic html5ever does (but definetly not using that crate)

untitaker avatar Feb 02 '22 23:02 untitaker

@lebensterben if you're looking to contribute to html5gum I think this would be a good start to get some overlap between lychee contributors and html5gum contributors as it also provides immediate value to lychee and hyperlink (if performance is improved by sufficient margin) Let me know if you are interested in that, also happy to answer any questions.

untitaker avatar Feb 04 '22 10:02 untitaker

allow for custom allocators for all the strings we create -- similar to strtendril magic html5ever does (but definetly not using that crate)

Why not? Is strtendril not working?

Ygg01 avatar May 25 '22 11:05 Ygg01

@Ygg01 while lurking the servo zulip and the public issue tracker, my impression was that author simon sapin would like to rewrite the library or get rid of its usage in html5ever, eg https://github.com/servo/tendril/issues/58

I also think it's fine if users of html5gum get only passable performance out of the box, as long as there's options to tweak things to optimal performance.

untitaker avatar May 25 '22 23:05 untitaker

Hmm, perhaps it's possible to use zbuf from html5ever branch? Maybe fork it or something.

Ygg01 avatar May 27 '22 08:05 Ygg01

I believe long-term, html5gum should implement a treebuilder + dom. so it may be wiser to reevaluate the discussion about string allocation until after then.

untitaker avatar May 29 '22 15:05 untitaker