dom
dom copied to clipboard
Proposal: DOM APIs in web workers?
I think there are valid use cases for DOM APIs like DOMParser, XMLSerializer, document.implementation.createDocument() etc. to be available in web workers. I don't mean having direct access to the current document (that wouldn't make sense, of course), but being able to parse, create, modify and serialize "offscreen" documents. Use cases for this include:
- Parsing & serializing XML files off the main thread: For example, I'm currently working on a web-based rich text editor and Microsoft Word alternative, and I'm planning to add DOCX (Microsoft Word document file) support to it in the future. A DOCX file basically consists of a bunch of XML files zipped into a compressed archive. I can then (un)compress the zip file with the help of (De)CompressionStream and parse the XML files with
DOMParseror create them withXMLSerializer. Currently, this has to be done on the main thread which will lead to the page being unresponsive while reading/writing DOCX files. Some projects like @jakearchibald's SVGOMG, an SVG optimizer & minifier based on SVGO, are currently even using XML parsing libraries like Sax instead of the browser'sDOMParser– amongst other reasons, to make them work in web workers. - Generating HTML files off the main thread: Applications that generate HTML files – be it website builders, math document editors, Markdown to HTML transpilers, etc. – could profit immensely from being able to convert their internal representations to HTML off the main thread.
Since only a few months, all three major browser engines support worker modules and OffscreenCanvas, so I think websites are starting to do more and more expensive stuff off the main thread, with people like @surma having advocated for that for years.
From a technical perspective, my proposal is that e.g. a global self.document property is exposed in workers, which is a stripped down version of Document containing only the following properties and functions:
self.document.implementationself.document.createAttribute()self.document.createAttributeNS()self.document.createCDATASection()self.document.createComment()self.document.createDocumentFragment()(?)self.document.createElement()self.document.createElementNS()self.document.createEvent()self.document.createExpression()self.document.createProcessingInstruction()self.document.createRange()(?)self.document.createTextNode()
Additionally, the following interfaces should be exposed in workers:
Document&XMLDocumentDocumentTypeDOMImplementationDocumentFragmentDOMParserXMLSerializerXSLTProcessorSanitizerNodeParentNodeAttrCharacterDataTextCDATASectionElementCommentHTMLElementand all HTML element interfacesSVGElementand all SVG element interfacesMathMLElementNodeListHTMLCollectionAbstractRange,StaticRange&RangeMutationObserver&MutationRecord(?)NamedNodeMapProcessingInstructionXPathResult,XPathExpression&XPathEvaluator
One could then use new DOMParser().parseFromString() or self.document.implementation.{createDocument(), createHTMLDocument()} to create a new document, modify it with all the usual and beloved DOM methods, and stringify it with new XMLSerializer().serializeToString() or myOffscreenDocument.documentElement.outerHTML.
Things like Element.prototype.getClientRects() or Element.prototype.computedStyleMap() don't make sens with offscreen documents of course, but that is already the case with documents created on the main thread with DOMParser or document.implementation.createHTMLElement.
While I'd be +1 on this, this part is misleading:
I don't mean having direct access to the current document (that wouldn't make sense, of course)
that's already possible with coincident/window and it does make sense ... we use that to drive WASM targeting programming languages from a worker, without ever blocking via Atomics, giving them the ability to interact 1:1 with the DOM API (or anything else only available on main) so it's a solved problem to us, but surely having it native would be awesome, yet we're good, and we have demanded, working, and usable use cases, even my own DOM libraries work in there out of the box, so please let's not spread FUD around what's desirable or possible, as that's not necessary, thanks.
edit P.S. you'd probably be good with that module too, just use those API as they are from a worker and give it a shot, you might be surprised by everything just working out of the box. If not, please file an issue to the project, thanks again.
What are the advantages of this proposal, vs being able to create an iframe that runs in a different thread?
@jakearchibald That seems a bit... clunky? Coming from a worker, you'd have to pass a message to the main thread, which sends it to the sandboxed iframe, which sends the result back to the main thread, which sends it back to the worker. Am I missing something here? At the end of the day, using an iframe for that is a hack, and not what iframes were designed to do. DOMParser & friends are not something that are architecturally coupled to the main thread, so they simply should just be available in workers as well.
@WebReflection Hmmm... A thing that makes web workers so awesome is that they are completely isolated from the main thread – on modern systems, they even run in separate CPU cores – and therefore are not constrained by having to finish any synchronous work before the browser renders the next frame. Stuff like DOM operations with the current document are fundamentally synchronous operations and have to operate on the main thread which manages it. Of course, you could give workers access to the current document, but the way this would work internally in the browser is that the worker would somehow notify the main thread to make a DOM operation, the main thread then does this synchronously, and sends a "done" message back to the worker. And this is exactly what libraries like your coincident, via.js or comlink are already doing, just by implementing it themselves with Proxies, Atomics, postMessage, etc. And don't get me wrong: I think it absolutely is an awesome developer experience to be able to modify the current DOM directly from a worker, but building this natively into web browsers simply improves DX because you don't have to use a library for that anymore (or implement all the Proxy/Atomics/postMessage horror yourself), but I don't think you would get any performance benefits from it, as the DOM operations would still have to be executed on the main thread at the end, just that the browser would do it for you and you (or your library) don't have to worry about it anymore.
The proposal I'm talking about wouldn't involve the current document – and therefore the main thread – at all, and would work truly independent from anything outside the worker itself, which is not at all possible today (except if you use an iframe as Jake mentioned, or if you build your own HTML/XML parser, custom "virtual" DOM implementation, and HTML/XML serializer – which will never be as performant as the browser's native methods). This would give actual performance benefits as you have your own separate thread and can do a long, synchronous operation like parsing a giant HTML/XML file that may last dozens of milliseconds, while the document and the main thread simultaneously do their own independent thing.
you'd probably be good with that module too, just use those API as they are from a worker and give it a shot, you might be surprised by everything just working out of the box.
Going back to my use case of parsing a large amount of XML files extracted from a zipped DOCX file, existing libraries like your concident or other ones mentioned above do provide awesome developer experiences, but they do not solve my use case, as even though you can then create a DOMParser in a worker, everything is still just a proxy to the main thread (correct my if I'm wrong here) and the actual XML parsing would be executed on the main thread – which is exactly what I'm trying to avoid.
@BenjaminAster you are right, proxied stuff will operate from the main when it comes to main-only utilities, but if iframe already uses a separated thread (or ... does it?) you can use coincident or other projects from that iframe and delegate the iframe to communicate eventually stuff to its parent? if the iframe doesn't create its own thread though I agree having DOMParser in workers is desirable and surely less hacky.
What are the advantages of this proposal, vs being able to create an iframe that runs in a different thread?
FWIW, I think being able to create document fragments in a Worker that can be manipulated, without having to pay the cost of layouting or rendering, but can be sent to a renderer thread seems valuable to me and sufficiently different from an iframe.
(I suppose a case could be made to introduce something like <iframe no-render> that can skip layout and rendering and effectively becomes a Worker with a DOM ™️ . Not sure if that has second-order implications tho).
@BenjaminAster
That seems a bit... clunky? Coming from a worker…
Yeah, that's fair. If your starting point is a worker, the iframe solution isn't great. But, maybe being able to create one of these iframes from a worker is a solution.
At the end of the day, using an iframe for that is a hack, and not what iframes were designed to do.
I don't find this very compelling. You could equally, and truthfully say that DOM APIs weren't designed to be in workers. Whatever solution is employed here will involve changing the intentional design of something.
DOMParser & friends are not something that are architecturally coupled to the main thread
Yes they are. They're absolutely coupled to documents. That's why they aren't available in workers.
Maybe their design could be changed so they don't need to be coupled to documents, but isn't where we're at right now.
It feels like folks think there's a single line in browsers like:
if (isWorkerEnvironment) return;
exposeDOMAPIs();
But that isn't the case. It isn't that DOM APIs are simply not-exposed workers, it's that DOM APIs are not designed to work in non-document environments. Allowing DOM APIs to exist in workers will be a massive undertaking in terms of spec and implementation.
I'm not saying it's impossible, but it's not just flipping a flag.
DOM APIs are massively interlinked with style and rendering. It might be easier to create a new set of interfaces that don't have that issue, and can be cloned/transferred, and upgraded to HTMLElement & co within a document context.
DOM APIs are massively interlinked with style and rendering
but (new DOMParser).parseFromString(...) works already, right? I am not sure, if stuff is never live, how this API could be problematic once exposed via Worker :thinking:
I went ahead and did a test ... the iframe hack is awkward (it needs a sandbox that apparently allows a different thread and at the same time is discouraged and it warns but it's needed for worker to execute).
index.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width,initial-scale=1">
<script src="../../mini-coi.js"></script>
<script>
addEventListener('message', ({data}) => {
document.body.append(data);
});
</script>
</head>
<body>
<iframe src="iframe.html"
sandbox="allow-scripts allow-same-origin"
frameborder="0" width="0" height="0"
style="position:absolute;top:-1px;left:-1px"
></iframe>
</body>
</html>
iframe.html
<!DOCTYPE html>
<script type="module">
import coincident from '../../window.js';
coincident(new Worker('./worker.js', {type: 'module'}));
</script>
worker.js
import coincident from '../../window.js';
const {window} = coincident(self);
const parser = new window.DOMParser;
const document = parser.parseFromString(
'<!doctype html>',
'text/html'
);
document.body.textContent = 'Hello World';
// send a message to the parent
window.parent.postMessage(document.documentElement.outerHTML);
// <html><head></head><body>Hello World</body></html>
I believe this would cover @BenjaminAster non-blocking use case via a whole DOM API that should not execute among the main thread but I couldn't find any encouraging discussion around this assumption, yet it seems to be de-facto standard.
btw ... I've just realized that if the iframe is already on a different thread, coincident is kinda useless ... I just used it to be sure I could at least have it running from an iframe but if it uses the iframe thread and that's sync, there's no advantage in doing that at all ... so iframe doesn't look like an answer if we can't guarantee it runs on a separate, non-blocking, thread.
@WebReflection
if stuff is never live, how this API could be problematic once exposed via Worker 🤔
What do you mean by 'live'? Remember that some elements have actions when they're constructed, not just when they're connected. Eg creating an image.
it needs a
sandboxthat apparently allows a different thread
I don't believe browsers run iframes in a different thread, even if they have the sandbox attribute.
iframe doesn't look like an answer if we can't guarantee it runs on a separate, non-blocking, thread
Right, that's why I was proposing a feature that did that.
Remember that some elements have actions when they're constructed, not just when they're connected. Eg creating an image.
of course I did not think about that, fair enough then.
I don't believe browsers run iframes in a different thread, even if they have the sandbox attribute.
from live tests via SO iframes run in a different thread if:
- the
srcpoints to a different domain - the
sandboxattribute is used ... at least that's what devs observed and tested live
Right, that's why I was proposing a feature that did that.
it'd be awesome, and if not too problematic and it can speed up things more, @surma hint around no-render would be a strawberry on the cake.
Remember that some elements have actions when they're constructed ... Eg creating an image.
wait a minute though ... I don't see any network activity in here ... that's what I meant by live ... if we parse to retrieve a document I don't think the parser constructs out of the box those elements until these are live/adopted ... what am I missing?
(new DOMParser).parseFromString('<img src="shenanigans.png">', 'text/html')
Maybe images were a bad example then - my point is that someone is going to have to go through all the elements and check that their constructor behaviours are worker compatible.
I'd be curious to know which element might have issues though, as I think most of them need to be adopted and pass through the adopt algorithm before having any meaning for the current environment ... I've tested <base>, custom elements, others, I can't find anything working at all unless adopted by the "live document". MDN also doesn't specify anything around this behavior and standards mention that scripts will be flagged as not-executable https://html.spec.whatwg.org/multipage/dynamic-markup-insertion.html#dom-domparser-parsefromstring-dev
that's still something to consider while adopting those nodes ... moreover:
The document's encoding will be left as its default, of UTF-8. In particular, any XML declarations or meta elements found while parsing string will have no effect.
In the parsing model it's also not clear why this would be unsafe if the document is created via the API ... looking forward for some enlightenment around this.
Yes they are. They're absolutely coupled to documents. That's why they aren't available in workers.
Of course it will be some work to implement, but all of the things like computed CSS styles, layout, scripts, resource loading, ... aren't a thing in documents created by DOMParser or DOMImplementation::createHTMLDocument. That's what I meant by "not architecturally coupled to the main thread". I remember that when implementing OffscreenCanvas, that was a lot of work because suddenly stuff like font rendering and CSS parsing (via context2d.{font, fillStyle, etc.}) needed to work in workers. The only thing related to this that comes to my mind now is Document::styleSheets, which gives access to parsed CSS stylesheets and I think currently works also with "fake" documents. For this to work in workers, yes, there would have to be a basic CSS parser available in workers, but if that's too difficult to implement, I guess for the "minimum viable product" of worker DOM APIs, browsers could just leave this empty and not parse the CSS at all? I think the use cases for parsing CSS in a worker are minimal anyways.
Edit: Ok, it turns out Document::styleSheets does not work, but HTMLStyleElement::sheet does work, i.e.
new DOMParser().parseFromString("<!DOCTYPE html><style> body { color: red } </style>", "text/html").querySelector("style").sheet.cssRules
returns the correctly parsed CSS with one rule containing one declaration.
I don't believe browsers run iframes in a different thread, even if they have the sandbox attribute.
I know @WebReflection already mentioned that now, but at least in Chrome where I tested it, it seems that iframes with a sandbox attribute do run in their separate thread. You can try it out with e.g. this setup:
index.html:
<!DOCTYPE html>
<html lang="en">
<head>
<script type="module">
const frame = () => {
millis.textContent = performance.now()
requestAnimationFrame(frame)
}
requestAnimationFrame(frame)
</script>
</head>
<body>
<div id="millis"></div>
<iframe src="iframe.html" sandbox="allow-scripts"></iframe>
</body>
</html>
iframe.html:
<!DOCTYPE html>
<html lang="en">
<head>
<script type="module">
const frame = () => {
millis.textContent = performance.now()
requestAnimationFrame(frame)
}
requestAnimationFrame(frame)
block.onclick = () => {
while(true);
}
</script>
</head>
<body>
<div id="millis"></div>
<button id="block">block</button>
</body>
</html>
If you click the "block" button in the iframe, the iframe is totally blocked but the parent frame continues to run.
Live demo now published at benjaminaster.com/playground/async-iframe
at least in Chrome where I tested it, it seems that iframes with a sandbox attribute do run in their separate thread
Interesting. That wasn't the case a couple of months ago when I last tested it. It that the case on mobile too?
Is that the case on mobile too?
in the SO thread somebody mentioned on Android heuristics can be different (no guarantees, depends on ... things ...) but on Desktop it seems to be consistent.
The thread mentions also that multiple iframes, even with sandbox attribute, will share the same thread so if you add 2 iframes in the above example a click in one will (should) block the other iframe too (still not the main thread).
If you click the "block" button in the iframe, the iframe is totally blocked but the parent frame continues to run.
Live demo now published at benjaminaster.com/playground/async-iframe
It blocks the whole tab for me. Desktop Chrome 115.0.5790.114 on mac.
It blocks the whole tab for me. Desktop Chrome 115.0.5790.114 on mac.
Ha, I had tested it in Chromium 113 on my Raspberry Pi (separate threads), and now in Chrome 115 on Windows and Android, where it indeed blocks the main thread... Interesting. So it either changed in a very recent Chrome version, or my Raspberry Pi somehow handles that differently. Anyways, yep, you're right, iframes do generally block the whole tab, so they're not an option!
... so they're not an option!
and imho they shouldn't be in general, now that I think about it, because an iframe with a guaranteed thread (like a worker) would compete with workers at that point, making workers kinda redundant as inferior to iframes ability (no DOM parsing ability), beside the security concerns when foreign scripts might try to access their content.
Might be worth splitting the discussion here up into two topics:
- Ergonomic differences between iframe-as-thread and worker-with-dom
- Spec + technical feasibility (of exposing a DOM to Workers, and of allowing
<iframe sandbox>to strictly imply OMT)
For #1:
It seems like any ergonomic warts in the process of constructing an iframe are either solvable in userland (essentially add an optimized mechanism for using <iframe sandbox> in a JS-loading-JS scenarios rather than just HTML-loading-HTML).
The ergonomics of the DOM-in-Worker are clearer to me, the issues there seem to be more on the spec and implementation side.
For #2: I think there are some fringe cases for DOM-in-Worker that make this particularly tricky. Some potential cases off the top of my head:
- what happens to inline scripts when parsing?
- what happens to iframe or other nested documents (
<svg foreignObject>,<embed>et al) in parsed documents within the Worker (do they get forced into the same thread?) - how would things like the
mediaattribute work given that a document and its nodes have no direct relationship with display?
I can think of possible answers to these things, but they would all seem to require substantial revisions to DOM specs. Seems like it would be easier to spec out a "lite" DOM interface that avoids all of these issues by omitting presentation-related APIs.
I think there are some fringe cases for DOM-in-Worker that make this particularly tricky.
All of the problems you mentioned have already been solved when browsers implemented DOMParser and DOMImplementation::createHTMLDocument(). If DOM APIs in workers would be spec'd, we could simply use the behavior that currently exists with them, only now in workers.
what happens to inline scripts when parsing?
Nothing. JS doesn't get executed, as is currently the case on the main thread with DOMParser and createHTMLDocument()
what happens to iframe or other nested documents (<svg foreignObject>, <embed> et al) in parsed documents within the Worker (do they get forced into the same thread?)
External content (iframe, embed, img, ...) doesn't get loaded at all. <foreignObject> gets parsed normally & on the same thread.
how would things like the media attribute work given that a document and its nodes have no direct relationship with display?
It doesn't. The media attribute in e.g.
<link rel="styleheet" href="dark.css" media="(prefers-color-scheme: dark)" />
would do absolutely nothing, and it doesn't matter since the CSS file doesn't get loaded anyway. Again, all of this is already the case today with "fake" documents created on the main thread via DOMParser or DOMImplementation::createHTMLDocument().
@developit I agree with @BenjaminAster there: nothing you mentioned is an issue with current living standard because DOMParser and parseFromString do nothing until created nodes from that document get adopted.
In Workers, there's no way to adopt these in any meaningful way ("live content") because nothing is ever live ... no src, no source, no CSS, nothing ... the parseFromString rightly does parsing only, the rest is performed only when stuff gets adopted on the main, live, thread (which can't be the case within workers as we can't postMessage DOM nodes, as per structured clone algorithm specs).
The way browser engines such as Blink, Gecko, & WebKit are written right now, the vast majority of DOM code assumes that it's running in the main thread. Making it possible to run that code in a worker is a massive undertaking. Is it theoretically possible? Yes, but it's by no means simple or easy. It could easy be a multi-year/multi-engineer effort.
Is it theoretically possible? Yes, but it's by no means simple or easy.
I don't think anyone in here believes it's a flag switch, like Jake suggested, but it would be interesting to understand why the main is so special in "just parsing" regards (which of course needs many other classes exposed to work properly).
It could easy be a multi-year/multi-engineer effort.
LinkeDOM (or other projects that already run in workers) could be a great polyfill in the meantime but if there's no vendors interest in moving forward with this proposal there won't be interest in making these projects closer to standards than they are now.
I don't think anyone in here believes it's a flag switch, like Jake suggested
They really do. See the thread that started this one https://github.com/w3c/ServiceWorker/issues/846 - the feeling there is very much that service workers chose to block DOM APIs from that context. Even down to the latest comment https://github.com/w3c/ServiceWorker/issues/846#issuecomment-1659197077.
They really do.
sad thread ... and I should've specified in here :sweat_smile:
@rniwa on a second thought about this:
It could easy be a multi-year/multi-engineer effort.
I think that if we had a way to ensure a separate, non-blocking, thread for an iframe we could cut some corner and have what we want, in terms of functionality, even if that's not exactly where we want it (workers) ... as apparently in some circumstance iframes already get that thread, would @surma suggestion around having a no-render (or any other name) be a fast way forward, hopefully relatively easier than bringing the DOM to Workers?
Fwiw, in my linear() app, I wanted to be able to analyse SVG paths off the main thread. To do this, I needed to bring another implementation of SVG paths into a worker. I couldn't use the built-in APIs because there's no easy standard way to run them in a different thread. A different-thread iframe (rendered or not) would have solved this.
That might be a different use-case though.
In terms of DOM-in-workers, any thoughts on mine and @developit's suggestion to have a different, minimal API for this? As in, it doesn't create HTMLImageElements, where you have things like naturalWidth and decode(), but a simpler tree model that can be later upgraded to real elements, and that upgrade can only happen in a document.
It might be good to contextualise what people want. The ability to de-serialize an HTML string into some kind of object model - and back again - is a hugely different problem than reifying HTML into a DOM; as others have alluded to.
If the ask is "I don't want to bring my own HTML parser when the browser has a perfectly good one outside of Workers" then that closes the scope to a large degree compared to "I want to have the full suite of DOM APIs and shuttle tree fragments between threads".
What gives me pause about this discussion is; while I don't think people are naive enough to believe the DOM is intentionally blocked from workers, I do think that even people in this thread are failing to correctly grasp (or articulate) exactly what they want and the ramifications of that. I think the reason DOMParser() exists, and not HTMLParser() is because it answers a question and gives developers a fully reified DOM sits at the very end of a set of steps of taking HTML and turning it into UI. Everything in between is full of so much nuance that it's hard to find one place to settle on.
An HTML parser would alleviate you from some code within workers, and maybe give you a nice performance boost, but I think if people asked for it, they'd end up disappointed with what you get for it (not the DOM). Having a tree of objects that don't ascribe any semantic meaning to each node gives you very little, and once all that data gets sent to the main thread it still needs to be reified into the DOM, and all the things that your application wants like event listeners. The OP gives some good use cases for having general purpose serialisation but those cases aren't UI, they're data transformation. The rest of the thread talks of UI.
On the other hand having an object model that represents HTML requires full reification, which includes all the aforementioned steps and all the decisions about that must come from somewhere - so you're either introducing a fake environment which means whatever DOM you pass back to the main thread needs to effectively go through the same reification all over again (which means reification gets done twice and possibly diverges in each, making a worker DOM not WYSIWYG) or you need to introduce shenanigans tying a worker to a main thread's DOM so you can marshal data back and forth in order to make decisions, at which point you're back to blocking and may as well have done it in the main thread.
In addition, to talk of some of the use cases of the OP; I don't think the use cases are quite as compelling on the second glance. Let's take for example markdown to HTML. The final artefact is indeed DOM but it's much simpler to write a markdown to HTML converter (that is, converting one string to another string), then hand that to a browser to convert into DOM, than it is to write a markdown to DOM converter. While it would be useful to have an HTML parser to sanitize input, that is the last step in a chain of operations that has to happen before DOM, and pretty much where the contract ends. Up until sanitization the fastest and easiest way to generate HTML from markdown is string to string. DOM APIs would give us nothing in converting markdown.
The rest of the thread talks of UI.
I never mentioned UI as desired feature and others mentioned no-render too as UI is not interesting or requested (also a non-sense from a Worker?) ... the OP, to which I agree with, is about having the parser exposed ... true that this requires a broader discussion around what we then want from the resulting document to happen when listeners are added or other special things (see Jake mention of naturalWidth) but it looks like we all agree (Surma desire of posting fragments a part) that a parser that produces a lightweight tree but it still validates inputs would be already a huge step forward in regards to this feature request.