html5ever Batch parsing of inline scripts/stylesheets

Before: html5ever produces tokens for each line of inline content After: html5ever batchs inline contents until encounter '<', which represents endtag in most cases Fixes: https://github.com/servo/servo/issues/34502

Jun 23 '25 07:06 MegaCorn

html5ever emits the line number for each token it parses. I this change will break the line counting, because we no longer hit https://github.com/servo/html5ever/blob/a7c9d989b9b3426288a4ed362fb4c4671b2dd8c2/html5ever/src/tokenizer/mod.rs#L258-L260 for each newline in the input data. That's probably why we interrupt the tokenizer when we see a newline in the first place.

I'd like to get rid of the line count at some point, because it also has a significant performance overhead in other places (https://github.com/servo/html5ever/pull/601#issue-2991840144), but I have not investigated the implications.

Jun 25 '25 08:06 simonwuelker

The line count is important for providing useful error information for CSS and JS errors/warnings in inline style/script blocks.

Jun 26 '25 07:06 jdm

Perhaps there is some way to increase the line number by the number of \n characters present in the batched results after leaving the raw text state or something like that?

Jul 12 '25 08:07 jdm