html5ever icon indicating copy to clipboard operation
html5ever copied to clipboard

Batch parsing of inline scripts/stylesheets

Open MegaCorn opened this issue 8 months ago • 3 comments

Before: html5ever produces tokens for each line of inline content After: html5ever batchs inline contents until encounter '<', which represents endtag in most cases Fixes: https://github.com/servo/servo/issues/34502

MegaCorn avatar Jun 23 '25 07:06 MegaCorn

html5ever emits the line number for each token it parses. I this change will break the line counting, because we no longer hit https://github.com/servo/html5ever/blob/a7c9d989b9b3426288a4ed362fb4c4671b2dd8c2/html5ever/src/tokenizer/mod.rs#L258-L260 for each newline in the input data. That's probably why we interrupt the tokenizer when we see a newline in the first place.

I'd like to get rid of the line count at some point, because it also has a significant performance overhead in other places (https://github.com/servo/html5ever/pull/601#issue-2991840144), but I have not investigated the implications.

simonwuelker avatar Jun 25 '25 08:06 simonwuelker

The line count is important for providing useful error information for CSS and JS errors/warnings in inline style/script blocks.

jdm avatar Jun 26 '25 07:06 jdm

Perhaps there is some way to increase the line number by the number of \n characters present in the batched results after leaving the raw text state or something like that?

jdm avatar Jul 12 '25 08:07 jdm