html icon indicating copy to clipboard operation
html copied to clipboard

XML error behavior for parseFromString() causing compat issues

Open emilio opened this issue 6 months ago • 3 comments

What is the issue with the HTML Standard?

The spec for DOMParser.parseFromString says that, for the error case:

If the previous step resulted in an XML well-formedness or XML namespace well-formedness error, then:

  • Assert: document has no child nodes.
  • Let root be the result of creating an element given document, "parsererror", and "http://www.mozilla.org/newlayout/xml/parsererror.xml".
  • Optionally, add attributes or children to root to describe the nature of the parsing error.
  • Append root to document.

That matches what Gecko does, but it seems WebKit and Blink do a bit weirder error recovery.

  • They keep the partially parsed page, and insert as the first node under the document element the error message, which is not a custom namespace but a regular HTML element.
  • For SVG they do something even weirder.

We've found a couple (somewhat minor, but still) compat issues that are caused by this. In the last one, the page ends up parsing an HTML 404 page, and the JS code follows the spec and checks that document.documentElement.nodeName == "parsererror", thus firing an error. This is arguably a bug on the site, but it happened to work with Blink / WebKit's implementation because they preserve the root element from the original page, so the page doesn't detect the 404 error :').

I'm not a huge fan of what Blink / WebKit are doing, but it'd be nice to decide if we keep the spec and Firefox as is (and fix Blink / WebKit), or change the spec and Firefox.

emilio avatar Jun 11 '25 21:06 emilio

cc @rniwa @mfreed7

annevk avatar Jun 12 '25 07:06 annevk

We discussed this briefly at the WHATNOT meeting (#11358). It was me and three Mozilla folks, so we didn't make much progress on cross-implementer agreement. But some thoughts:

  • We'd all like to avoid the SVG behavior if possible, and treat all XML documents the same

  • I expressed my personal/editor-hat preference for the Gecko behavior, because we lack a rigorous spec for the XML parser, so defining "the portion of the document that's parsed before the parsing error is encountered" is not really possible to do in the spec, and seems tricky to get interoperable. Are we going to create an exhaustive test suite of all possible XML parsing errors, and how much of the document gets left behind??

  • Emilio clarified that in addition to the linked compat issue, Mozilla had encountered one other instance, and that in neither case was there major site breakage. So changing to Mozilla behavior is likely possible.

domenic avatar Jun 12 '25 08:06 domenic

Discussed in https://github.com/whatwg/html/issues/11358.

cwilso avatar Jun 13 '25 23:06 cwilso