html
html copied to clipboard
Relax `<select>` parser
This patch makes the parser allow additional tags in <select> besides <option>, <optgroup>, and <hr>, mostly by removing the "in select" and "in select in table" parser modes.
In order to replicate the behavior where opening a <select> tag within another open <select> tag inserts a </select> close tag, a traversal through the stack of open elements was added which I borrowed from the <button> part of the parser.
This patch also changes the processing model to make <select> look through all its descendants in the DOM tree for <option> elements, rather than just children and optgroup children which conform to the content model. This is needed for compat reasons because there are websites which put other tags in between their <select> and <option>s which would break with this parser change unless we also update this processing model. More context here and here.
Fixes https://github.com/whatwg/html/issues/10310
- [x] At least two implementers are interested (and none opposed):
- Chromium
- https://github.com/mozilla/standards-positions/issues/1086
- https://github.com/WebKit/standards-positions/issues/414
- [x] Tests are written and can be reviewed and commented upon at:
- https://github.com/html5lib/html5lib-tests/pull/178
- https://wpt.fyi/results/html/semantics/forms/the-select-element/customizable-select/select-parsing.tentative.html
- [x] Implementation bugs are filed:
- Chromium: Already implemented
- Gecko: https://bugzilla.mozilla.org/show_bug.cgi?id=1933591
- WebKit: https://bugs.webkit.org/show_bug.cgi?id=283732
- [x] MDN issue is filed: I read the select, option, and optgroup element pages, and I didn't see any content that needs to be updated.
- [x] The top of this comment includes a clear commit message to use.
(See WHATWG Working Mode: Changes for more details.)
/form-elements.html ( diff ) /index.html ( diff ) /infrastructure.html ( diff ) /parsing.html ( diff )
Why does this also make a bunch of changes to the processing model of select elements? If that needs to be part of this change that should really be better motivated in the commit message.
Why does this also make a bunch of changes to the processing model of select elements? If that needs to be part of this change that should really be better motivated in the commit message.
The changes in the processing model do support the new proposed content model for select, but they also mitigate compat issues for websites which are already putting tags in between <select> and <option> in their HTML.
For example, without these changes to the processing model, the following <select> would not register as having any options at all:
<select>
<div>
<option>...</option>
...
</div>
</select>
In my compat analysis, I found a lot of websites which are doing this, so in order to ship the parser changes separately from customizable select in chrome, I plan to ship the parser changes and these processing model changes together, because otherwise there would be too much breakage.
I'm happy to put them in a separate PR if you want, or keep them here and update the commit message (sorry for not putting it in there). Which would you prefer?
Thanks! Given that rationale I think it's good to couple the changes, but that should be in the commit message as well.
This doesn't define optional tags for </option> and </optgroup> correctly.
The definition for "have a particular element in select scope" may be needed for that, but should be changed to be similar to "have a particular element in button scope" (but for select).
In particular, allow these without a parse error:
<select>
<optgroup>
<option>
<optgroup>
</select>
The second <optgroup> should pop the option and optgroup
<select>
<option><p>
<option>
</select>
This should generate implied end tags and pop the option.
<select>
<optgroup>
<hr>
<option>
<hr>
</select>
The hrs should pop the optgroup and option in a select.
See how the parser deals with <ruby><rtc><rt>, I think that can be used as a model for select.
Thanks! Given that rationale I think it's good to couple the changes, but that should be in the commit message as well.
Done.
This doesn't define optional tags for
</option>and</optgroup>correctly.
@zcorpan thanks for the feedback! I did some experimenting and added some logic to the start tags for option, optgroup, and hr. What do you think?
- Need to check that a
selectelement is in scope so that parsing of option/optgroup tags outside ofselectdoesn't change. Example<option><p>1</option>2<option>3 - Need to add
selectto the "in scope" list so that<div><select></div><option>doesn't close theselect. - Should report parse errors when unexpected elements are popped. Compare with what the spec does for
rt/rtcinruby. Example<select><option><span>1</option>2<option>3
- Need to check that a
selectelement is in scope so that parsing of option/optgroup tags outside ofselectdoesn't change. Example<option><p>1</option>2<option>3
I thought that this is covered by "While the stack of open elements has an option element in select scope". What exactly should I change?
- Need to add
selectto the "in scope" list so that<div><select></div><option>doesn't close theselect.
Done
- Should report parse errors when unexpected elements are popped. Compare with what the spec does for
rt/rtcinruby. Example<select><option><span>1</option>2<option>3
I'm guessing this is from "If the current node is not now a rtc element or a ruby element, this is a parse error," right?
Should I add "If the current node is not now an option element, this is a parse error" after "While the stack of open elements has an option element in select scope, pop an element from the stack of open elements"?
The "in select scope" I think should be removed altogether since it assumes the stack will not have other elements when in a select, which is no longer the case. Use the normal "in scope" instead.
High-level of what I think should happen: when parsing option or optgroup start tag: check that select is in scope, check that option or optgroup is in scope, generate implied end tags (except for optgroup when handling <option>), check the current node or the stack of open elements again, report a parse error if appropriate, then pop elements off the stack until the option or optgroup has been popped, then insert the new element.
I can look into this more next week and suggest more specific changes.
High-level of what I think should happen: when parsing option or optgroup start tag: check that select is in scope, check that option or optgroup is in scope, generate implied end tags (except for optgroup when handling
<option>), check the current node or the stack of open elements again, report a parse error if appropriate, then pop elements off the stack until theoptionoroptgrouphas been popped, then insert the new element.
Thanks! I gave this a try
@zcorpan how does the latest text look?
This doesn't define optional tags for
</option>and</optgroup>correctly.
I was talking to @mfreed7 about the changes we've made in the PR so far, and I feel like I couldn't provide a good explanation for why we are making the parser not support cases like these:
<hr>inside an<option>- nested
<optgroup>s
Is it just compat reasons? Is there a good justification?
It would be a breaking change from what is conforming HTML today, and break compat for sites that omit </option> and </optgroup> tags. It's the same as why we can't allow certain elements in p.
It would be a breaking change from what is conforming HTML today
But that's true of the rest of this PR also, correct? E.g. making <input> no longer close the <select>. We are just thinking that the "new" behavior we're headed towards with appearance:base is that roughly "all" content is now valid within <select>. It feels very odd to instead say "all content is valid, except for nested <optgroup> and <hr> within an <option>". Neither of those sound like they come from anything other than a place of web compat grandfathering. So I guess our hope would be that we can try to remove those two restrictions along with the rest, and see what actually falls out as a real web compat issue? Unless there's some deeper semantic problem with nesting optgroups? Or including horizontal lines within an option?
<input> is not allowed in select today, so it's not a change in behavior for conforming HTML. I think there's an implied contract of not breaking content that follows HTML's authoring requirements.
hr in select is pretty new, so we could probably make it not imply </option> without affecting web compat significantly. A tradeoff is for authors who want to omit optional tags can't omit </option> only when the next child is hr. Is that more annoying than not being able to use hr in option? I suspect no.
I expect we can't allow nesting of optgroup, but I haven't researched it. If that is allowed, it would be reasonable to expect that the native dropdown can support nested optgroups. I don't know if that is possible on the relevant platforms.
It would be a breaking change from what is conforming HTML today
To clarify, I meant: It would be a breaking change for existing conforming HTML content.
fwiw, i can't think of a good reason for an author to even need an hr inside of an option.
if someone wants to create a visual separation between the text within an option - then that sounds like something that could/should be accomplished with CSS. The semantics/a11y mappings of a separator dont' really make sense in the context of an option
I expect we can't allow nesting of
optgroup, but I haven't researched it. If that is allowed, it would be reasonable to expect that the native dropdown can support nestedoptgroups. I don't know if that is possible on the relevant platforms.
Yeah that's fair, UIs won't support nested optgroups, at least not yet. There is a separate issue to track this here: https://github.com/whatwg/html/issues/5789
I suppose that if we want to support nested optgroups then we could revisit this, and disallow nested optgroups in the parser for now.
<input>is not allowed inselecttoday, so it's not a change in behavior for conforming HTML. I think there's an implied contract of not breaking content that follows HTML's authoring requirements.
hrinselectis pretty new, so we could probably make it not imply</option>without affecting web compat significantly. A tradeoff is for authors who want to omit optional tags can't omit</option>only when the next child ishr. Is that more annoying than not being able to usehrinoption? I suspect no.
fwiw, i can't think of a good reason for an author to even need an
hrinside of anoption.if someone wants to create a visual separation between the text within an option - then that sounds like something that could/should be accomplished with CSS. The semantics/a11y mappings of a separator dont' really make sense in the context of an option
We can forbid hr in select in the content model and still allow it to be parsed, just like we are doing with many other tags in this change. I suspect that hr in option would be a significantly less breaking change than other things.
I see the conforming HTML argument though and I don't feel that strongly about this. Mason's point is that we should only create special cases like this if we have a really good reason. If not changing conforming HTML is that important, then I think its probably fine since hr in select is not an important use case.
hr in select was added just last year, so we shouldn't flip-flop on that. But it could be allowed only as a sibling to an option element in the content model, if we're ok with hr no longer implicitly closing option elements.
Since the content model in https://github.com/whatwg/html/pull/10586 only allows div and span, maybe there's less need to "aggressively" close elements when seeing option or optgroup. Is only div and span the end goal, or do we eventually want to allow other HTML elements?
If, for web compat, it's ok to only close option if it's the current node when seeing "option" or "optgroup" (maybe also "hr"), that would make the parser change much simpler. It would also allow for a possible future of allowing nested optgroups (would need a div in between).
hr in select was added just last year, so we shouldn't flip-flop on that. But it could be allowed only as a sibling to an option element in the content model, if we're ok with hr no longer implicitly closing option elements.
The content model PR currently doesn't list hr elements in the content model for options
Is only div and span the end goal, or do we eventually want to allow other HTML elements?
Eventually allowing other HTML elements is ideal as we add support for additional use cases and find ways to make them accessible.
If, for web compat, it's ok to only close option if it's the current node when seeing "option" or "optgroup" (maybe also "hr"), that would make the parser change much simpler. It would also allow for a possible future of allowing nested optgroups (would need a div in between).
As Mason said, ideally we try removing restrictions and see what real issues arise. Would I implement this by removing the "generate implied end tags" line?
If we envision allowing more HTML elements later, then we should at least generate implied end tags, so that HTML authoring rules are somewhat consistent. For example, it should be possible to omit </p> in select.
If we envision allowing more HTML elements later, then we should at least generate implied end tags, so that HTML authoring rules are somewhat consistent. For example, it should be possible to omit
</p>inselect.
Ah I see. So if I leave that in and then remove the "While the stack of open elements has an option element in scope, pop an element" part and the equivalent one for optgroup then we should be good?
No, still need to pop the option or optgroup if that is the current node. Actually, using the ruby model is maybe best, however the parse error condition needs to be different (since we want to allow <select><div><option><option>, the parent is not a select here).
Something like this:
A start tag whose tag name is "optgroup"
If the stack of open elements has a
selectelement in scope, then generate implied end tags. If the stack of open elements now has anoptionoroptgroupelement in scope, then this is a parse error.Insert an HTML element for the token.
A start tag whose tag name is "option"
If the stack of open elements has a
selectelement in scope, then generate implied end tags, except foroptgroupelements. If the stack of open elements now has anoptionelement in scope, then this is a parse error.Insert an HTML element for the token.
This also removes "Reconstruct the active formatting elements , if any.", so that that happens inside the option element instead when seeing a token that reconstructs active formatting elements.
Thanks! I implemented your suggestion
Generating all implied end tags with no exceptions in the optgroup case disallows nested optgroups, but I guess I'm ok with that since we can make additional changes to allow it later.
I'll apply the changes in here so far to the chromium implementation and start working on html5lib tests so we can get this merged.
Great.
hr no longer closing option and optgroup is a behavior change from what the parser currently does, which I'm somewhat uncomfortable with. It makes currently valid HTML invalid. @hsivonen @annevk WDYT?
I'm not a big fan of breaking conforming HTML. Especially as there doesn't appear to be a compelling reason from reading the thread.
I recommend not using "grandfathering" by the way, it has a rather bad history: https://en.wikipedia.org/wiki/Grandfather_clause
Ok, I pushed a change to disallow hr in option. How does it look now?
@zcorpan @hsivonen are there any other changes you think should be made?
I just re-added some cases to option and optgroup parsing to continue supporting auto-closing tags outside of a select tag to support cases like this, which I found in a test case here:
const select = document.createElement('select');
select.innerHTML = '<option>one<option>two';