dom Un-merge Document and HTMLDocument

trafficstars

Does anyone object? It's been years, nobody has done it, it's claimed that IE did it and then un-did it. Boris has serious reservations about web-compatibility. Is there any actual reason to merge them? Obviously, it makes sense to keep APIs on Document where reasonable, but I don't see why it needs to be wholesale.

Apr 14 '16 14:04 ayg

@bz

Apr 14 '16 14:04 ayg

Note that my serious reservations are not based on data, but rather on lack of data. I could be convinced that this is ok, with actual data and analysis...

Apr 14 '16 14:04 bzbarsky

There has been some efforts to move things from HTMLDocument to Document in Blink, and it's been successful so far. https://bugs.chromium.org/p/chromium/issues/detail?id=238368

Of what remains I think that only document.all is worth worrying about. It's already on Document internally, but the risk is that some code depends on xmlDocument.all not existing.

I don't have time to spend on this right now, but if it'll get reverted in the spec I'd like a chance to try out the final changes in Blink, so let me know if there's a deadline.

Apr 14 '16 15:04 foolip

Right, moving particular APIs to Document might well make a lot of sense. Even moving them all might make sense. What doesn't make sense to me is doing the move in a single commit with no attempt to do per-property analysis.

Apr 14 '16 15:04 bzbarsky

So how about this: leave the spec as-is for now, start moving properties to Document on a per-case basis, and if we find any properties we don't want to try moving, change the spec to reintroduce HTMLDocument with those properties. Sound good to everyone?

Apr 14 '16 17:04 ayg

Yep, that sounds good. I'm pretty optimistic about Blink being able to make HTMLDocument empty, but there could be engine-specific compat issues. Not sure how risky the final step of making HTMLDocument an alias is, we'll just have to try it.

Apr 14 '16 19:04 foolip

Okay, so let's all try that.

Apr 17 '16 12:04 ayg

What's the benefit of merging HTMLDocument into Document? Do we really want various HTML-specific attributes / operations to be exposed on non-HTML documents?

This is a risky change compatibility-wise and major browser engines all had (and still have) HTMLDocument. This has been in the specification for years and yet no major browser has not implemented it yet.

I personally do not understand the benefits of this change but even if there are benefits, is it really worth the trouble / risk for all browser engines? I know I am not inclined to do this change in WebKit.

Jul 29 '16 16:07 cdumez

Missing not in the last sentence?

In Blink most of what was in HTMLDocument has already been moved to Document, see https://github.com/whatwg/dom/issues/221#issuecomment-209989740.

The set of things that are in HTMLDocument in Gecko and WebKit is not aligned.

In Edge, where the IDL files aren't available, I see that Object.getOwnPropertyNames(HTMLDocument.prototype) and Object.getOwnPropertyNames(XMLDocument.prototype) both return just ["constructor"] while Object.getOwnPropertyNames(Document.prototype) returns ["constructor", "URL", "URLUnencoded", "activeElement", "alinkColor", "all", "anchors", "applets", "bgColor", "body", "characterSet", "charset", "compatMode", "cookie", "currentScript", "defaultCharset", "defaultView", "designMode", "dir", "doctype", "documentElement", "domain", "embeds", "fgColor", "forms", "head", "hidden", "images", "implementation", "inputEncoding", "lastModified", "linkColor", "links", "location", "msCSSOMElementFloatMetrics", "msCapsLockWarningOff", "onabort", "onactivate", "onbeforeactivate", "onbeforedeactivate", "onblur", "oncanplay", "oncanplaythrough", "onchange", "onclick", "oncontextmenu", "ondblclick", "ondeactivate", "ondrag", "ondragend", "ondragenter", "ondragleave", "ondragover", "ondragstart", "ondrop", "ondurationchange", "onemptied", "onended", "onerror", "onfocus", "oninput", "oninvalid", "onkeydown", "onkeypress", "onkeyup", "onload", "onloadeddata", "onloadedmetadata", "onloadstart", "onmousedown", "onmousemove", "onmouseout", "onmouseover", "onmouseup", "onmousewheel", "onmscontentzoom", "onmsgesturechange", "onmsgesturedoubletap", "onmsgestureend", "onmsgesturehold", "onmsgesturestart", "onmsgesturetap", "onmsinertiastart", "onmsmanipulationstatechanged", "onmssitemodejumplistitemremoved", "onmsthumbnailclick", "onpause", "onplay", "onplaying", "onpointerlockchange", "onpointerlockerror", "onprogress", "onratechange", "onreadystatechange", "onreset", "onscroll", "onseeked", "onseeking", "onselect", "onselectionchange", "onselectstart", "onstalled", "onstop", "onsubmit", "onsuspend", "ontimeupdate", "onvolumechange", "onwaiting", "onwebkitfullscreenchange", "onwebkitfullscreenerror", "plugins", "pointerLockElement", "readyState", "referrer", "rootElement", "scripts", "scrollingElement", "styleSheets", "title", "visibilityState", "vlinkColor", "webkitCurrentFullScreenElement", "webkitFullscreenElement", "webkitFullscreenEnabled", "webkitIsFullScreen", "xmlEncoding", "xmlStandalone", "xmlVersion", "onpointercancel", "onpointerdown", "onpointerenter", "onpointerleave", "onpointermove", "onpointerout", "onpointerover", "onpointerup", "onwheel", "adoptNode", "captureEvents", "caretRangeFromPoint", "clear", "close", "createAttribute", "createAttributeNS", "createCDATASection", "createComment", "createDocumentFragment", "createElement", "createElementNS", "createExpression", "createNSResolver", "createNodeIterator", "createProcessingInstruction", "createRange", "createTextNode", "createTreeWalker", "elementFromPoint", "evaluate", "execCommand", "execCommandShowHelp", "exitPointerLock", "focus", "getElementById", "getElementsByClassName", "getElementsByName", "getElementsByTagName", "getElementsByTagNameNS", "getSelection", "hasFocus", "importNode", "msElementsFromPoint", "msElementsFromRect", "open", "queryCommandEnabled", "queryCommandIndeterm", "queryCommandState", "queryCommandSupported", "queryCommandText", "queryCommandValue", "releaseEvents", "updateSettings", "webkitCancelFullScreen", "webkitExitFullscreen", "write", "writeln", "createEvent", "querySelector", "querySelectorAll"].

In other words, it looks like Edge has already moved everything to Document.

@cdumez, what do you think would be a reasonable thing to attempt to converge on? Based on the above, I think the most tractable path with relatively low risk is to move everything into Document, but then leave HTMLDocument an empty interface just like XMLDocument.

Jul 29 '16 17:07 foolip

Yes, I was missing a 'not' in my last sentence. It is now fixed.

It is interesting that Edge made did change, I was not aware.

I guess moving everything to Document while keeping an empty HTMLDocument is relatively low risk as you say. I guess I am just not clear yet on the benefits of doing this.

Jul 29 '16 17:07 cdumez

@annevk will know, but probably the original motivation was that it's possible to insert any element into any document, so having a type of the document itself doesn't achieve much. Also, I think, to reduce gratuitous differences between XHTML (XMLDocument) and HTML. (Some algorithms still have different behavior based on the document type, though.)

Whatever the history, we're now in a weird state and need to plot a course towards interop. Does at least trying to make HTMLDocument empty seem reasonable? Given the changes already made in Blink and Edge, this is very likely web compatible.

Whether to attempt the final step of aliasing I don't know. If we don't, we may need to rethink the Document constructor, perhaps instead making it such that HTMLDocument and XMLDocument are the only possible instances.

Jul 29 '16 20:07 foolip

I am curious to know what Firefox is planning to do. If Firefox goes in this direction as well then WebKit would likely follow. So far, Gecko and WebKit are mostly in agreement.

Jul 29 '16 20:07 cdumez

@smaug----, can you comment on Gecko plans?

Aug 01 '16 10:08 foolip

I agree with bz. We need to do this per property, not all at once. And while doing it, it is possible that we figure out the merge isn't really possible. We don't really know the behavior of the stuff Edge has.

Does Edge still have HTMLDocument interface? What is the prototype chain of an (html) document?

I'm tiny bit worried about stuff like open/write/close for example: do they work as expect with SVG or Mathml or such?

Aug 01 '16 19:08 smaug----

I agree with bz. We need to do this per property, not all at once. And while doing it, it is possible that we figure out the merge isn't really possible. We don't really know the behavior of the stuff Edge has.

We moved it in smaller parts in Blink as well. Testing what others do makes sense for each step. It seems pretty likely that it's web compatible to move everything, so the question is what you'd like to attempt to do in Gecko.

Does Edge still have HTMLDocument interface? What is the prototype chain of an (html) document?

Edge still has an HTMLDocument interface, and an instance has the expected prototype chain HTMLDocument : Document : Node : EventTarget.

I'm tiny bit worried about stuff like open/write/close for example: do they work as expect with SVG or Mathml or such?

In Blink, these and writeln() all throw InvalidStateError for XML documents, as per spec: https://html.spec.whatwg.org/#dynamic-markup-insertion

In Edge, it looks like open() and close() don't throw an exception (not sure what they do) but write() and writeln() do.

So there's still some interop work to do here...

Finally, one thing that could go badly here if we all aim to make HTMLDocument and XMLDocument aliases of Document is that one or two engines succeed, but the next to try runs into engine-specific troubles similar to that around Gecko's XMLDocument.prototype.load, leaving us in an non-interoperable state for a long time.

Aug 01 '16 20:08 foolip

I am working on moving some properties from HTMLDocument to Document in WebKit. I am focusing on the properties that Blink has already moved and that I think make sense to have on Document (i.e. do not throw if called on anything else than an HTML document).

Since I don't anticipate HTMLDocument going away entirely anytime soon. I don't think it makes sense to move to Document attributes / operations that work only on HTML documents (e.g. open/close/write/writeln).

Aug 13 '16 22:08 cdumez

If some engines can't move one or two properties, the easiest solution would be to just move those back to HTMLDocument in the spec. As long as we have interop, it doesn't really matter where the properties are, so take the path of least resistance.

Aug 15 '16 18:08 ayg

Here is the current situation: Chrome 52: [‘alinkColor', 'all', 'bgColor', 'captureEvents', 'clear', 'fgColor', 'linkColor', 'releaseEvents', 'vlinkColor'] WebKit ToT: [‘alinkColor', 'all', 'bgColor', 'captureEvents', 'clear', 'close', 'fgColor', 'linkColor', 'open', 'releaseEvents', 'vlinkColor', 'write', 'writeln']

So the only difference between Chrome and WebKit at the moment are the following properties: "open", "close", "write" and "writeln".

As I said before, I did not move them on purpose because it seems the expected behavior is for them to throw when calling on anything else than an HTMLDocument. Given this behavior and the fact that we still have an HTMLDocument interface in WebKit, I see little point in moving them up to Document.

My hope is that an HTMLDocument interface is re-introduced in the HTML specification for the properties that are not useful on other types of documents (and legacy HTMLDocument attributes such as alinkColor since I don't think we want to expose legacy properties or more document types).

Aug 16 '16 17:08 cdumez

If we add it back we need to be very clear which contexts end up creating it although I suppose that is the case already due to "HTML document"…

Aug 16 '16 17:08 annevk

Some thoughts on the WebKit ToT list from https://github.com/whatwg/dom/issues/221#issuecomment-240182587

document.all looks to me like it it'd work just fine with any kind of document. The implementation itself is already on Document in Blink and when I tried moving it my ad-hoc test worked fine.
alinkColor, bgColor, fgColor, linkColor and vlinkColor work as long as there's a body element, and document.body is already on Document. There are also a bunch of collections like document.forms on Document.
clear(), captureEvents() and releaseEvents() are all no-ops so it doesn't matter much.
close, open, write and writeln would all throw exceptions and wouldn't be useful on Document. The one benefit of moving them anyway is that the exception message could say why it didn't work.

At the end of the day, I don't feel strongly about any of these, any course of action that would get us to interop would be fine.

The missing piece here is what to do about the Document constructor? Per https://github.com/whatwg/dom/issues/278#issuecomment-239461359 I think we should try to retain it, but what should it return? Gecko and WebKit both return a document with contentType "application/xml", but it's really unfortunate that it isn't an XMLDocument then. @cdumez @ayg, thoughts?

Sep 05 '16 11:09 foolip

I did a little bit of research with SELECT page,url FROM [httparchive:har.2016_08_01_chrome_requests_bodies] WHERE body CONTAINS 'instanceof XMLDocument' in BigQuery:

htmldocument.csv (used WHERE RAND() < 1/10 to get about 10%, leaving 1254 rows)
xmldocument.csv (76 rows)

It's immediately clear that instaceof HTMLDocument is way more common. I randomized the order and the first thing I found came from mutation-summary.js by @rafaelw: https://github.com/rafaelw/mutation-summary/blob/421110f84178aa9e4098b38df83f727e5aea3d97/src/mutation-summary.js#L112

@rafaelw, I didn't look too closely, but I take it this code actually matters for getting the correct behavior in both XML and HTML documents?

Copies of this code seem to dominate the usage. http://cdn.clicktale.net/www/ChangeMonitor-latest.js showed up a lot but seems to include an older version of mutation-summary.

Given the volume of results, I'd be cautious about changing the truthyness of instanceof HTMLDocument without further analysis.

As for XMLDocument, after checking the most common URL patterns I couldn't find anything that seemed like it would break. So I'd give better odds to making XMLDocument an alias of Document, but that's not a good idea as long as Gecko needs XMLDocument.protoype.load.

Sep 05 '16 12:09 foolip

https://github.com/whatwg/dom/issues/308 is also relevant.

I realized that "object HTMLDocument" and "object XMLDocument" are also relevant for compat. I find 606 instances of "object HTMLDocument" and 9243 instances of "object XMLDocument". I didn't dig deeply, but a lot of the latter seems to be MooTools: https://github.com/mootools/mootools-core/blob/1be4d62e912e8acc25ddccd5f8059a9cba0862ed/Source/Slick/Slick.Finder.js#L22

Sigh.

Sep 05 '16 13:09 foolip

In https://github.com/whatwg/dom/issues/221#issuecomment-210115403 I said "Not sure how risky the final step of making HTMLDocument an alias is, we'll just have to try it" upon which the issue was closed.

After https://github.com/whatwg/dom/issues/221#issuecomment-244738573 I'm a lot more cautious, and I think we should actually just revive HTMLDocument, unless there's concrete implementer interest for attempting the aliasing.

Sep 22 '16 10:09 foolip

@foolip that comment doesn't give an argument for reviving HTMLDocument. Being an alias preserves instanceof HTMLDocument working.

Sep 22 '16 10:09 domenic

Yes, but it'd also start "working" for things that aren't currently instanceof HTMLDocument, the whole point of the check seems to be to get at the https://dom.spec.whatwg.org/#html-document state.

It's possible that it's safe because the code doesn't actually run in any non-HTML documents, but it'd require more analysis.

Sep 22 '16 11:09 foolip

Right, my impression is that there are very few non-HTML documents out there, and people are just using it as a test that something is a document (and not, say, any other kind of node). But that is an assumption.

Sep 22 '16 11:09 domenic

That probably happens, but the case that dominated in https://github.com/whatwg/dom/issues/221#issuecomment-244738573 was about case sensitivity.

Sep 22 '16 11:09 foolip

I've tried to summarize everything I could find about document interfaces here: https://gist.github.com/foolip/103963a1ae8598d2baedd296f4a1bf4c

I think it would be very good if representatives from all implementers, including from Edge, could get together and discuss which path to take here. My own take is that it depends on use of XMLDocument's load() method in Firefox. If usage is low, then the "Document and HTMLDocument" path looks good. If not, then I prefer "HTMLDocument and XMLDocument".

Sep 28 '16 13:09 foolip

ctor for Document is already shipping. I would just do "Document, HTMLDocument and XMLDocument", since that requires least changes to the implementations and be done with this issue. Needs of course some spec updates.

(SVGDocument removal https://bugzilla.mozilla.org/show_bug.cgi?id=1293786 waiting for landing)

Sep 28 '16 14:09 smaug----

ctor for Document is already shipping

This was made very clear in the doc, but I've now also added that to the timeline, although I couldn't find the impl bug for Gecko, I just bisected its support to Firefox 20.

Do you agree that "When both Document and XMLDocument are concrete and represent the same thing, this is the least inconsistent state possible. It would be an odd API to avoid."?

Sep 28 '16 15:09 foolip

dom dom copied to clipboard

Un-merge Document and HTMLDocument

dom
dom copied to clipboard