html icon indicating copy to clipboard operation
html copied to clipboard

Out of order HTML streaming ("patching")

Open noamr opened this issue 5 months ago • 40 comments

What problem are you trying to solve?

When delivering HTML documents, often parts of the document are ready in the server before others. This order might not be the same as the DOM order.

For example, a <select> might appear early in the DOM, with the <option> elements for it fetched from a database in parallel, or 3rd party widgets that have a placeholder that gets filled later.

What solutions exist today?

Some frameworks like React & Svelte offer support for out-of-order streaming. React does so by injecting inline <script> elements that modify the DOM.

How would you solve it?

Suggesting to have a contentname attribute, and extending the <template> element to make it into a patch:

  • <template contentfor="foo"> (a patch) would target an element with the given contentname=foo (an outlet).
  • The first patch replaces the outlet's children, the next ones append (by default).

Additional/supporting APIs

  • The contentmethod attribute can change the default behavior, with values equivalent to DOM methods (replace-with, replace-children, after, before, append, prepend).
  • A contentrevision attribute can be present on both the patch and the outlet. If they are identical, the patching is skipped.
  • The outlet lookup is done with the template's parent as the scope, so those patches can't be deeply nested and then target some external element. An exception is that patches that are direct descendants of <body> can target the whole document.
  • In addition, the lookup is tree-scoped and has no affordances for escaping the shadow root.
  • Patching into a fragment works, but the same rule applies - which means that the patch cannot escape the fragment.
  • The target would receive pseudo-classes that reflect its patching status (:updating and :pending?)
  • We should consider a JS getter that reflects the current state of patching.

Gritty details:

  • Scripts that are part of a patch are executed as normal if performed as part of the main parser.
  • Scripts, styles, other RAWTEXT elements (e.g. xmp), <plaintext> and the document (HTML) element cannot be directly streamed into. RCDATA elements (title/textarea) can.
  • From a parser perspective, a setup similar to fragment parsing is set up inside the <template>'s stack of open items. This makes parsing behave like parsing into the context element and into the <template> at the same time. The insertion target is the context element.
  • Conflicts/mismatches are handled as per https://github.com/whatwg/html/issues/2142#issuecomment-3376495808

Alternative: using <script> with escaping

Using <template> feels natural as a way to stream HTML. However, it has the constraint of not being able to stream directly into scripts and styles (RAWTEXT).

An alternative would be to have the patch be escaped HTML inside a <script> element. Then we can use the existing tokenizer state of the script element, unescape the text, and pass it to a second parser that inserts the elements to the correct context.

This however feels unnatural as HTML passed over HTTP should not require escaping :) Since we can add a replaceWith mode in the future (instead of the current replaceChildren mode), it might be a better option than requiring the HTML to be escaped.

Potential future enhancements

  • Fetching and patching from an external resource (using the existing src attribute))

noamr avatar Aug 07 '25 10:08 noamr

I'm a bit worried about changing the parsing of template to RAWTEXT based on an attribute. This can cause new XSS/mXSS vectors. Is it possible to use an element that is already parsed as text, e.g. script?

zcorpan avatar Aug 07 '25 10:08 zcorpan

I'm a bit worried about changing the parsing of template to RAWTEXT based on an attribute. This can cause new XSS/mXSS vectors. Is it possible to use an element that is already parsed as text, e.g. script?

Yes <script type=patch> is an option. It would also open opportunities to explore using existing script attributes like src and defer.

noamr avatar Aug 07 '25 10:08 noamr

cc @whatwg/html-parser

zcorpan avatar Aug 07 '25 11:08 zcorpan

Another thing to add/suggest here, based on feedback from early adopters of the prototype: When there are multiple patches to the same target in the same document stream, only the first one removes all the children.

So the following:

<div id=target>...</div>
<template patchfor=target>ABC</template>
<template patchfor=target>DEF</template>

... would result in <div id=target>ABCDEF</div>.

(Replace with <script type=patch> if needed, this is in the sake of an example).

Basically the parser created for each target would only be closed once the main parser stream is closed.

This allows interleaved streaming to multiple locations out of order.

noamr avatar Aug 12 '25 08:08 noamr

Added agenda+ to discuss moving this to stage 1. It seems to me that it meets the bar:

  • We have put together a detailed explainer
  • Consensus that the WHATWG is interested in exploring solutions in this problem space: it seemed like it from past discussions in the WHATNOT, let's verify in the next one.
  • At least one implementor: chromium
  • Contributor: @foolip and @noamr
  • WHATWG workstream: HTML

noamr avatar Sep 09 '25 08:09 noamr

I'm a bit worried about changing the parsing of template to RAWTEXT based on an attribute. This can cause new XSS/mXSS vectors. Is it possible to use an element that is already parsed as text, e.g. script?

I've changed the OP after many investigations. Basically we suggest to:

  • keep the regular tokenizer mode, except from when streaming into RCDATA (title/textarea)
  • not allow streaming directly into an existing style/script/xmp etc
  • be open to a future enhancement to patch after/before/prepend/append/replaceWith. The latter would allow replacing an existing script/style and wouldn't have the parsing issue.

noamr avatar Oct 01 '25 13:10 noamr

not allow streaming directly into an existing style/script/xmp etc

Why not?

I suppose script is potentially problematic as it would normally run the script when a text node is inserted. We could set the "already started" flag if you stream into a script element, though?

zcorpan avatar Oct 09 '25 12:10 zcorpan

not allow streaming directly into an existing style/script/xmp etc

Why not?

I suppose script is potentially problematic as it would normally run the script when a text node is inserted. We could set the "already started" flag if you stream into a script element, though?

The main issue is not script execution, but rather the inline nature of interleaving.

So the following would create unexpected results, as the end tag for the tokenizer would now be </template> instead of </script>:

<template contentfor=some-script>
  console.log("</template>");
</template>

In addition, the following would work differently based on whether the browser supports this, in a disruptive way:

<template contentfor=some-script>
  console.log("<--");
</template>

For supported browsers, this would work. For unsupported browsers, this would start a comment somewhere in the middle of the template (github syntax highlighting shows the issue...).

If people wanted to achieve this, they should probably insert the script into an existing element, or use contentmethod=replace or some such.

noamr avatar Oct 09 '25 12:10 noamr

OK. Why is that not an issue for RCDATA? It uses https://html.spec.whatwg.org/#appropriate-end-tag-token

What about streaming into <plaintext>?

zcorpan avatar Oct 09 '25 16:10 zcorpan

OK. Why is that not an issue for RCDATA? It uses https://html.spec.whatwg.org/#appropriate-end-tag-token

It's simple enough and an important enough use case (e.g. changing title). e.g. it doesn't have the comment mismatch. It means you can do something like this but it's pretty harmless:

<template contentfor=title>
Text </title>
</template>

It means you need to escape the text </template> but it's more reasonable than having to escape it inside scripts.

What about streaming into <plaintext>?

I think we can safely not support that.

noamr avatar Oct 09 '25 16:10 noamr

I did not know we were going to change how we tokenize inside template. That seems rather dangerous and unexpected. I'm not comfortable with that much side effecting. cc @hsivonen

annevk avatar Oct 10 '25 08:10 annevk

I did not know we were going to change how we tokenize inside template. That seems rather dangerous and unexpected. I'm not comfortable with that much side effecting. cc @hsivonen

ATM this is explored as a specific behavior for RCDATA (title/textarea). We could probably find an alternative solution for those like having to escape their content or some such

noamr avatar Oct 10 '25 08:10 noamr

Other alternatives with RCDATA elements:

  • Not allow them at all, authors would have to do something like:
<template contentfor="title" contentmethod="replace-with">
  <title>New title</title>
</template>
  • Re-serialize before setting the content
  • Only patch them if the content of the template is a text node.

Changing the tokenizer state seems like the most straightforward option and not too disruptive for RCDATA specifically, but it's not material to the overall design.

noamr avatar Oct 10 '25 09:10 noamr

Fundamentally if we're going to use the template element we are going to allow different content models compared to what is possible with ordinary input. Handling RCDATA but ignoring the context element and insertion modes seems like a rather arbitrary approach. Either way there is going to be new mXSS attack surface, but the more inconsistent we make this the harder it will be to figure out what is going on.

annevk avatar Oct 10 '25 10:10 annevk

Fundamentally if we're going to use the template element we are going to allow different content models compared to what is possible with ordinary input. Handling RCDATA but ignoring the context element and insertion modes seems like a rather arbitrary approach. Either way there is going to be new mXSS attack surface, but the more inconsistent we make this the harder it will be to figure out what is going on.

Yes of course, this is about finding a tradeoff that provides the use case while exposing the least amount possible of exotic new parser options.

noamr avatar Oct 10 '25 10:10 noamr

Having thought about this feedback a bit more, an alternative design of this can look like this:

<table>
  <tr contentname="my-tr">
</table>

<script contentname="my-script"></script>

<template contentmethod="append|prepend|replace-with|replace-children">
    <tr contentname="my-tr"><td>Content
    <script contentname="my-script">console.log("new content")</script>
   <!-- this is invalid and stays in the template / gets discarded -->
   <tr>bla bla
</template>

This requires a lot less changes to template parsing and a lot less opportunities to create invalid HTML trees, and also allows titles/script/style patching without having to do anything special with the tokenizer. It also allows us to have one less attribute (contentfor), as contentname can match on both sides.

Note that it's very awkward with before/after, so for now they should perhaps not be supported.

noamr avatar Oct 10 '25 12:10 noamr

Hello, For framework use there needs a way to stream parts of the text, without any element wrappers, frameworks do that by using comments as markers, how would this be solved? example in marko playground (see resulting html there)

To not be worse than current JS solutions it needs to be done in a way that doesn't mess with styling and javascripts, so it behaves as if it doesn't exist, so to say it is not a div, unlike with declarative shadow dom multiple such markers should be able to coexist near each other.

But with this a question arises, is there even need to have all that global "contentname" attributes when all that is needed really are two elements in which one acts like replace-with? By replacing placeholder template with a content and another placeholder template you could implement that kind of an appending/prepending infinite stream. Parts of this proposal seem unnecessarily, this is really something that needs to be discussed with framework authors that implemented and have experience with out of order streaming.

Behaviour of raw data seems rather unfortunate too.

Some frameworks like React & Svelte offer support for out-of-order streaming

Also I'm pretty sure Svelte still doesn't really support HTML streaming? (docs seems to say NO, mentioned in https://github.com/sveltejs/svelte/discussions/16784) EDIT: Actually it seems they support out of order streaming of JS data in SvelteKit with their data loader functions, just not a streamed rendering like others do. The ones that actually do are: Marko (supported it for more than 10 years), Solid, React All work in a similar way by streaming content wrapped in template elements and using script tags to insert them to the correct position, using some way of walking a dom relying on some markers to find where to insert stuff, with special handling for RCDATA and whatever else. As an alternative it is possible to use declarative shadow dom for such out of order streaming already, but that has too many limitations so no one really uses that, has limitations with styling, multiple async streams and nested async, nearby shadow roots, required wrapper elements, etc.

kanashimia avatar Oct 11 '25 22:10 kanashimia

Thanks for the feedback @kanashimia!

Regarding RAWTEXT and RCDATA, see https://github.com/whatwg/html/issues/11542#issuecomment-3389815088 - I think this change in API change would make those work out of the box.

Regarding using anonymous markers - a lot of this has been previously discussed in https://github.com/WICG/declarative-partial-updates/issues/6, and I agree that it's a limitation of the current design relative to using anonymous comment nodes.

We've deferred this discussion so far because:

  1. A lot of the use cases can (arguably) be accomplished using appending/splicing elements rather than using anonymous content nodes.

e.g. for the example:

<p contentname=foo>
test 1 <br>
test 2 <br>
test 3 <br>
</p>

<template contentmethod=append>
  <p contentname=foo>test 4<br>
</template>

Admittedly this is different from using anonymous comment markers as the p itself is an element. Is the difference apparent in real world applications? (this is a real question, not rhetorical)

  1. in past iterations of this we've used IDREFs, and there is no good way to use them to target anonymous comment-based placeholders without creating some divergence re. what they mean. However, I do think that this can be achieved now that we've switched to a bespoke referecing attribute (contentname), or at least in a future iteration.

e.g. we could do something like this that wouldn't feel too intrusive to existing DOM (though I would like to see if @zcorpan or @annevk oppose to having something like this):

<p>
<!-- contentstart foo --> 
test 1 <br>
test 2 <br>
test 3 <br>
<!-- contentend foo -->
</p>

<template contentmethod=append>
  <p contentname=foo>test 4<br>
</template>

I personally like the idea of following how userland solved this as much as possible (with the added value+constraints of having this as parts of the web platform).

Note that we would still need to have contentname as a global attribute for things like scripts and styles as you can't have a comment block inside those.

noamr avatar Oct 12 '25 08:10 noamr

Well with regrads to text data people write stuff like: 1500 Views / 10 Comments to stream those numbers they would need to be wrapped in an element, even though it isn't needed really, or the whole thing will need to be replaced. From a framework pov it isn't reasonable to just randomly wrap things in an elements, so wrapping would be left up to a user.

A more compelling reasoning is generality, nice thing about comment markers is that you almost don't have to worry about nearby dom structure. For example see:

<div>
  <p>foo</p>
  <async>
    <p>a</p>
    <p>aa</p>
  </async>
  <async>b</async>
  <p>bar</p>
  <p>baz</p>
  <async>c</async>
  <p>daz</p>
</div>

You could implement this using something like before/after but would have to track all elements nearby, this is pretty complicated, and more error prone as imagine if that <p>foo</p> suddely removed from the dom or moved to the other place, that could happen if it is a "portal" or because of adblock scripts, relying on comment markers is safer.


Note that we would still need to have contentname as a global attribute for things like scripts and styles as you can't have a comment block inside those.

Personally I'm not sure that is a compelling use case, as it can be done by streaming in multiple such elements. But if there is a use case it can be done like this:

<!contentstart foo> 
  <script>foo();</script>
<!contentend> 

<template contentname="foo">
  <script>foo();bar();</script>
</template>

For use case of updating <title> elements that is how you would do it right? This takes more data to stream in but because of compression it may be not that big of a problem. But for this keeping track of the streaming process on the server is needed. Would be nice to hear about the use cases.


Another observation is that <!foo> is currently parsed as a comment due to error recovery mechanisms, but <!doctype> is its own thing, as an idea maybe it would be possible to do markers like that so it for sure won't cause a conflict with already existing comments? Minor downside is that some badly implemented parsers may struggle with that.

kanashimia avatar Oct 12 '25 13:10 kanashimia

Markers are really a separate feature tentatively called DOM parts. I don't think we should intertwine them.

annevk avatar Oct 13 '25 05:10 annevk

Yes, that is pretty much DOM parts. Out of order streaming and templating are closely related, that is why it is possible to use declarative shadow dom for that to some degree, even thought it totally wasn't an intended purpose. Features boil down to "declare html in one place, insert it into another", so why shouldn't we intertwine them? To say I'm not against having after, before, append, prepend methods in addition to replace-with, that could very well be useful to save on traffic to some degree, that could be added to the DOM parts proposal too. It is possible for this feature to exist without DOM parts, just that it won't be as useful as a target, and it doesn't make a lot of sense from my point of view to work on this completely independently. But that is probably something that should be discussed in the webcomponent repo.

kanashimia avatar Oct 13 '25 06:10 kanashimia

Yes, that is pretty much DOM parts. Out of order streaming and templating are closely related, that is why it is possible to use declarative shadow dom for that to some degree, even thought it totally wasn't an intended purpose. Features boil down to "declare html in one place, insert it into another", so why shouldn't we intertwine them?

They are somewhat related in the way that the template element works.

To say I'm not against having after, before, append, prepend methods in addition to replace-with, that could very well be useful to save on traffic to some degree, that could be added to the DOM parts proposal too.

It is possible for this feature to exist without DOM parts, just that it won't be as useful as a target, and it doesn't make a lot of sense from my point of view to work on this completely independently. But that is probably something that should be discussed in the webcomponent repo.

I see this as a good use case for making more progress on DOM parts and making sure they work well with patching

noamr avatar Oct 13 '25 06:10 noamr

(replace-with, replace-children, after, before, append, prepend).

How strongly motivated are the modes other than replace-children and append?

It seems that replace-children is removing children and then doing append, right? Appending to a parent node is a capability that the parser already has. The only other kind of insertion that the parser already knows to do is foster parenting.

Do we really need insertion capabilities other than appending to a parent node for HTML patching?

hsivonen avatar Nov 10 '25 09:11 hsivonen

(replace-with, replace-children, after, before, append, prepend).

How strongly motivated are the modes other than replace-children and append?

It seems that replace-children is removing children and then doing append, right? Appending to a parent node is a capability that the parser already has. The only other kind of insertion that the parser already knows to do is foster parenting.

Do we really need insertion capabilities other than appending to a parent node for HTML patching?

There were requests during incubation to allow inserting content between two other nodes, and in general not being constrained to just replacing everything (or appending). But I think that starting with replace-children and append and expand later can be a good strategy.

noamr avatar Nov 10 '25 12:11 noamr

Take something like:

<template contentmethod="replace">
  <div contentfor="article-body">…content…</div>
  <div contentfor="nav">…content…</div>
</template>

Due to the streaming nature, there may be a phase where the wrong nav is displayed for the wrong content. Because of this, I wonder if it'd become advisable to do:

<template contentmethod="replace">
  <div contentfor="article-body"><!-- empty --></div>
  <div contentfor="nav"><!-- empty --></div>
</template>
<template contentmethod="replace">
  <div contentfor="article-body">…content…</div>
  <div contentfor="nav">…content…</div>
</template>

…so that the old content is cleared out first. It's worth thinking of how this would interact with contentrevision.

jakearchibald avatar Nov 16 '25 03:11 jakearchibald

Also, does contentrevision somewhat limit you to using contentmethod=replace? Otherwise the new content revision isn't represented in the DOM.

jakearchibald avatar Nov 16 '25 03:11 jakearchibald

Also, what's the feature-detection & enhancement story here?

jakearchibald avatar Nov 16 '25 04:11 jakearchibald

Also, what's the feature-detection & enhancement story here?

Similar to declarative shadow DOM. The template will be attached to the DOM as normal and a polyfill or other script can detect it is there.

noamr avatar Nov 16 '25 08:11 noamr

Also, does contentrevision somewhat limit you to using contentmethod=replace? Otherwise the new content revision isn't represented in the DOM.

contentrevision isn't in the first version of this. But the idea is that the parser would set that attribute on the placeholder.

noamr avatar Nov 16 '25 08:11 noamr

It feels like what should only happen for replace and maybe replace-children.

jakearchibald avatar Nov 16 '25 11:11 jakearchibald