dom Allow more characters in element/attribute names and prefixes

Closes #849. Closes #769.

[x] At least two implementers are interested (and none opposed):
- Gecko
- Chromium
[ ] Tests are written and can be reviewed and commented upon at:
- TODO first implementer should do this!
[x] Implementation bugs are filed:
- Chrome: https://bugs.chromium.org/p/chromium/issues/detail?id=1334640
- Firefox: https://bugzilla.mozilla.org/show_bug.cgi?id=1773312
- Safari: https://bugs.webkit.org/show_bug.cgi?id=241419

(See WHATWG Working Mode: Changes for more details.)

Original points for discussion, discussed and concluded on in following comments

I did not disallow = inside attribute local names. Both the parser and DOM APIs currently disallow them, except the parser allows it for the first character. I'm happy to change this if people prefer; I started with the simpler version.
This does not disallow lone surrogates, the Unicode replacement character U+FFFD, single quotes, or < in any position, because the HTML parser allows introducing those already and it seems nicer to align.
I did not change validation for createProcessingInstruction() or createDocumentType(). We could try to simplify those too, perhaps after investigating parser behavior. But they didn't seem to be causing any real web developer pain, unlike elements and local names, so I thought it'd be better to just leave them as-is.

May 05 '22 19:05 domenic

Discussed equals sign at HTML triage meeting. Conclusion: disallow it in attributes everywhere. (Even though the parser allows it in the first-character position.)

Jun 02 '22 16:06 domenic

I think this is ready for re-review.

Potential issue: XML's definition of Char seems nonsensical (it excludes various Unicode characters below U+0020). And, its definition of the [^#x00#x09#x0A#0x0Cx0D#x20/>] syntax depends on that definition. Hmm.

Jun 06 '22 21:06 domenic

Refined to no longer use EBNF.

Jun 07 '22 15:06 domenic

In particular if the first code point is from BeyondHTMLParserName the second code point was more limited.

I'm not sure exactly what you mean. Recall that it's a union of both. The second+ code point is from HTMLParserCompatibleName, which had [^#x00#x09#x0A#0x0Cx0D#x20/>]* for that position. Which is exactly what the current draft says, right?

Jun 07 '22 16:06 domenic

I don't think the EBNF allows for the second code point to be U+0001 when the first is :, for instance. At least the intent was to prevent that. Does EBNF work completely differently from ABNF in that | doesn't signify OR but instead "union"?

(I didn't see "An equivalent EBNF is the following" initially and I don't think what it states is correct.)

Jun 07 '22 17:06 annevk

I see, I did not capture that this was a branching scenario depending on the behavior of the first code point. And you addressed what harms names like that might hypothetically cause in https://github.com/whatwg/dom/issues/849#issuecomment-1058064183 .

I'll revise.

Jun 07 '22 17:06 domenic

I think that is done. The other way I could write this is by looping over the characters individually, which is what a performant implementation would do (instead of using lots of O(n) "contains" operations). But I think this is relatively clear.

(Edit: well, a performant implementation would be looping over code units, since that's JS's native string format... which feels ickier to spec.)

Jun 07 '22 17:06 domenic

OK, this (and https://github.com/whatwg/html/pull/7991) is just waiting on someone to write web platform tests. Then we can close a ~5 year old recurring pain point on the web platform!

For fun, these are all the references to this I can find:

https://github.com/whatwg/dom/pull/449
https://github.com/whatwg/dom/issues/769
https://github.com/whatwg/html/issues/4275
https://github.com/whatwg/html/issues/3733
https://www.google.com/search?q=createElement+OR+setAttribute+%252BInvalidCharacterError

I suspect there are more GitHub issues from earlier, because why would I have posted #449 if not because of some other issue someone filed? But I couldn't find them.

Jun 08 '22 15:06 domenic

@josepharhar would you be interested in finishing this?

Feb 14 '23 12:02 annevk

Yes, I have started a WPT here: https://github.com/web-platform-tests/wpt/pull/38503

Feb 15 '23 00:02 josepharhar

\o/ I suspect that once you implement this and do a try run you'll find a lot of existing WPT tests that can be adjusted. There's probably no need for a new file, but maybe.

Feb 15 '23 13:02 annevk

Any progress on this?

May 05 '23 19:05 cdumez

Not recently, I have unfortunately been focused on other stuff.

May 08 '23 17:05 josepharhar

dom dom copied to clipboard

Allow more characters in element/attribute names and prefixes

dom
dom copied to clipboard