wordpress-develop icon indicating copy to clipboard operation
wordpress-develop copied to clipboard

HTML API: Lower-case HTML tag names in `get_qualified_tag_name()`.

Open dmsnell opened this issue 1 year ago • 6 comments

Trac ticket: Core-61576.

Since this method is meant for printing and display, a more expected return value would be the lower-case variant of a given HTML tag name.

This patch changes the behavior accordingly. No tests are impacted by this change.

Diff best viewed ignoring whitespace changes.

Follow-up to [58867].

dmsnell avatar Sep 11 '24 17:09 dmsnell

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props dmsnell, jonsurrell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

github-actions[bot] avatar Sep 11 '24 17:09 github-actions[bot]

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance, it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

github-actions[bot] avatar Sep 11 '24 17:09 github-actions[bot]

@sirreal I don't feel strongly about this, but I do think that if we want to consider the change it'd be best to do before 6.7 is released, as after that it would be a backwards-compatibility break. I've expanded the docblock with a comparison to get_tag().

dmsnell avatar Sep 12 '24 01:09 dmsnell

I've thought about this some more and I don't think we should make this change.

Since this method is meant for printing and display, a more expected return value would be the lower-case variant of a given HTML tag name.

I'm not convinced this is the purpose of the method, although it depends what is meant by "printing and display."

Primarily, HTML API is for working with HTML input and HTML output. In HTML, the casing of tag names is irrelevant. There's no reason for svg tags to be "correctly" cased (<altGlyph> instead of <altglyph>), just like there's no reason to use upper or lower or any particular casing for HTML tag names. This method handles the correct casing for SVG element tag names, but that's not important for serializing and printing the svg tags in an HTML document.

It doesn't seem more correct to me to lowercase HTML tag names, MathML tag names, and then used some mixed casing on SVG tag names when the element name differs.

Here's my take on this method.

This method applies the rules from the specification on parsing foreign content:

If the adjusted current node is an element in the SVG namespace, and the token's tag name is one of the ones in the first column of the following table, change the tag name to the name given in the corresponding cell in the second column. (This fixes the case of SVG elements that are not all lowercase.)

Tag name Element name
altglyph altGlyph
altglyphdef altGlyphDef

This seems to adjust for a difference between an HTML tag name (case insensitive) and an SVG element name. This adjustment is important as a consideration of tree construction where HTML tokens are transformed into elements in a tree.

At the moment, this roughly corresponds to Node.nodeName for elements (and Element.tagName). If we make this change, then it becomes an arbitrary decision.

(An interesting note, qualified name is a DOM concept and does seem to correspond to this method as implemented.)

And to state the obvious, it should be trivial for consuming code to lowercase tag names if desired.


if we want to consider the change it'd be best to do before 6.7 is released, as after that it would be a backwards-compatibility break

Definitely.

sirreal avatar Sep 12 '24 09:09 sirreal

It doesn't seem more correct to me to lowercase HTML tag names

It's less about being correct and more about expectations. I think if you survey a bunch of people and ask them how they feel about turning <p><a><span> into <P><A><SPAN> they will have opinions about that. Also, we can survey HTML in the wild and see what a sampling of global expectations might be, given the prevalence of the styling.

Here is a survey from my list of ~300k HTML pages.

Type of tag Count Percent
ALL UPPER 307,499 2.7%
all lower 12,388,675 96%
Mixed Case 221,307 1.7%

My point in sharing these numbers isn't to say they dictate what we do; just noting that the overwhelming majority of HTML out there is using lower-case tag names and people have grown accustomed to them.

In the case of normalization, this is the default behavior, which is why I care about it.

But also going back in time, the reason I remember for introducing these functions was just to ensure that the html5lib tests pass which check against the adjusted foreign content tag names and attribute names. I don't feel these have a central important role in the spec compliance.

Your point is sound: it's trivial for calling code to lower-case-fold the tag names. Except, then they also have to remember to only do that for elements in the HTML namespace and not to do so for foreign elements. That leaves calling code calling this function and then immediately asking if it's an HTML element and then lower-casing.

$tag_name = $this->get_qualified_tag_name();
if ( 'html' === $this->get_namespace() ) {
	$tag_name = strtolower( $tag_name );
}

maybe I had a gut reaction since get_qualified_tag_name() just made the exact same namespace check before returning, and it felt like spreading out the same semantic between the inside and outside of that method.

I was going to close this, but I think I'll leave it open at least a little longer to continue pondering.

dmsnell avatar Sep 12 '24 20:09 dmsnell

For general interest: here is the list of all-caps and mixed-case tag names from my survey. The list includes tag closers, and I didn't attempt to check if the closer casing matched the opening casing. Obvious HTML errors are evident, especially in the list of once-seen tags.

The list
ALL UPPER TAGS
    65,179: A
    30,621: BR
    28,545: TD
    23,392: FONT
    18,875: P
    15,786: IMG
    15,154: B
    13,723: TR
    13,142: LI
    10,102: OPTION
     6,578: META
     5,069: TABLE
     4,701: HEAD
     4,699: HTML
     4,658: TITLE
     4,576: BODY
     4,137: I
     4,008: ID
     3,945: EM
     3,476: H1
     3,167: CENTER
     2,532: HR
     2,351: SPAN
     1,654: INPUT
     1,529: STRONG
     1,515: UL
     1,098: AREA
       990: SCRIPT
       853: H2
       844: O:P
       792: H3
       785: WBR
       777: DIV
       740: LINK
       738: TH
       661: MENU
       620: BLOCKQUOTE
       610: TBODY
       418: U
       400: FORM
       379: H4
       318: SMALL
       270: STYLE
       266: FRAME
       200: PRE
       186: ADDRESS
       174: BIG
       168: MAP
       166: PARAM
       151: SELECT
       142: FRAMESET
       124: TT
        94: SPACER
        92: ABBR
        88: CITE
        81: SUP
        78: NOBR
        72: NOSCRIPT
        63: H6
        56: BASE
        52: H5
        51: COL
        49: IFRAME
        36: COLGROUP
        35: DD
        34: NOFRAMES
        33: DT
        30: OBJECT
        28: EMBED
        27: SUMMARY
        25: MARQUEE
        25: SUB
        22: HEADER
        22: LAYER
        20: EC
        19: BASEFONT
        19: OL
        15: FIGURE
        14: NAV
        13: DFN
        12: V:F
        12: ARTICLE
        12: ILAYER
        11: SECTION
        11: X-CLARIS-WINDOW
        11: X-CLARIS-TAGVIEW
        11: CAPTION
        11: BUTTON
         9: THEAD
         9: FIGCAPTION
         8: O
         8: ZBLINK
         8: LABEL
         7: CODE
         7: DIR
         7: LH
         7: AUDIO
         6: BLINK
         6: STYLE='MSO-BIDI-FONT-WEIGHT:
         5: X-SAS-WINDOW
         5: BGSOUND
         4: FOOTER
         4: TEXTAREA
         4: ALIGN=LEFT
         4: X-CLARIS-REMOTESAVE
         4: NOINDEX
         3: APPLET
         3: LEFT
         3: BOLD
         3: KBD
         3: SP
         2: Y
         2: E=
         2: C
         2: N͓
         2: MARK
         2: STRIKE
         2: WSJ
         2: NOLAYER
         2: ALIGN="LEFT"
         2: INSERT_COUNT*
         2: BD
         2: NOFRAME
         2: INS
         2: LNAME
         2: FNAME
         2: URL
         2: ACRONYM
Once-seen tags: A,, AAREA, ALIGN="RIGHT", ALIGN=CENTER, ASIDE, B<P, BGCOLOR="#000000", BOTTOM, BR<B, BRïï, CENTRE, CFINCLUDE, CLEAR, COLDEF, COLDEFS, CONNECTED,PREFERRED, CSACTION, CSACTIONDICT, CSACTIONITEM, CSACTIONS, CSSCRIPTDICT, CUFON, CUFONCANVAS, CUFONTEXT, DL, DOC, DOCTYPE, EF_B_RED, FIELDSET, GCSE:SEARCH, GCSE:SEARCHBOX-ONLY, H, HEADS_TAG, HRNOSHADE, HTM, HTML!, IF_ERRORPARAM, IF_ERRORSTR, IF_ERRORTYPE, INSERTFLASHHEAD, INSTITUTE, JSON, LEGEND, LI,<A, MAJOR, METANAME="DESCRIPTION", METANAME="KEYWORDS", NAME, NOAUTOLINK, NOF, O:LOCK, OCCUPATION, ONTOLOGY, ROWS, SAMP, SPOUSE, T, TAIL, TILTE, TIME, U7:P, UNION_TAG_INDEX_FOOTER, UNION_TAG_INDEX_HEADER_1, UNION_TAG_INDEX_HEADER_2, UNION_TAG_INDEX_TITLE, UP-21, V:FORMULAS, V:PATH, V:SHAPETYPE, V:STROKE

Mixed Tags
    32,047: Key
    32,041: Contents
    32,041: LastModified
    32,041: ETag
    32,041: Size
    28,041: StorageClass
     4,007: Owner
     4,004: DisplayName
     4,000: Generation
     4,000: MetaGeneration
     1,578: feColorMatrix
     1,466: feComposite
     1,390: feComponentTransfer
     1,390: feFuncA
     1,387: feFuncR
     1,387: feFuncG
     1,387: feFuncB
       807: linearGradient
       680: clipPath
       459: Error
       459: Code
       459: Message
       380: RequestId
       379: Option
       373: HostId
       155: feGaussianBlur
       125: feBlend
       122: feOffset
       111: bR
        91: Br
        80: tD
        79: feFlood
        75: Td
        74: feMergeNode
        71: Font
        64: o:SmartTagType
        49: textPath
        48: st1:State
        45: Meta
        43: radialGradient
        41: tR
        38: animateTransform
        38: ListBucketResult
        38: Name
        38: Prefix
        38: Marker
        38: IsTruncated
        37: feMerge
        37: st1:City
        34: MaxKeys
        33: Tr
        31: Center
        31: Table
        30: rdf:RDF
        30: asp:ListItem
        29: Img
        28: feMorphology
        26: Strong
        26: foaf:givenName
        26: foaf:familyName
        25: QueryParameterName
        25: QueryParameterValue
        25: Reason
        24: st1:PlaceName
        23: AccountName
        23: cc:Work
        21: Script
        21: Title
        20: RecommendDoc
        19: st1:PlaceType
        19: class="Text"
        16: Li
        13: noScript
        13: psi:contextVar
        13: contBox-x
        12: Input
        11: u51:SmartTagType
        10: Details
        10: Body
        10: ItemTemplate
        10: u46:SmartTagType
        10: u48:SmartTagType
         9: Dd
         9: foreignObject
         9: Th
         9: psi:queryVar
         9: u26:SmartTagType
         9: u40:SmartTagType
         9: u45:SmartTagType
         9: u52:SmartTagType
         9: u53:SmartTagType
         9: st1:Street
         8: u28:SmartTagType
         8: u29:SmartTagType
         8: u31:SmartTagType
         8: u42:SmartTagType
         8: u47:SmartTagType
         8: u49:SmartTagType
         8: u54:SmartTagType
         8: u55:SmartTagType
         8: Button
         8: asp:RequiredFieldValidator
         8: asp:TextBox
         7: color="#CC0000"
         7: MainOrArchivePage
         7: rdf:Description
         7: u23:SmartTagType
         7: u24:SmartTagType
         7: u27:SmartTagType
         7: u33:SmartTagType
         7: u36:SmartTagType
         7: u37:SmartTagType
         7: u43:SmartTagType
         7: Head
         7: color=#fFee00
         6: Event-Card-Open-Close-Toggle
         6: HFBusiness
         6: u1:SmartTagType
         6: u4:SmartTagType
         6: u25:SmartTagType
         6: u30:SmartTagType
         6: u34:SmartTagType
         6: u35:SmartTagType
         6: u38:SmartTagType
         6: u39:SmartTagType
         6: u41:SmartTagType
         6: u44:SmartTagType
         6: u50:SmartTagType
         6: u57:SmartTagType
         6: u58:SmartTagType
         6: u60:SmartTagType
         6: u62:SmartTagType
         6: u63:SmartTagType
         6: u64:SmartTagType
         6: u65:SmartTagType
         6: u66:SmartTagType
         6: u67:SmartTagType
         6: u68:SmartTagType
         6: u69:SmartTagType
         6: iconSm-x
         6: Select
         6: asp:Panel
         5: psi:sessionVar
         5: u32:SmartTagType
         5: u61:SmartTagType
         5: st1:PostalCode
         5: liNK
         4: InLineReplace
         4: Style
         4: psi:sortOp
         4: u7:SmartTagType
         4: u8:SmartTagType
         4: u9:SmartTagType
         4: u10:SmartTagType
         4: u11:SmartTagType
         4: u12:SmartTagType
         4: u17:SmartTagType
         4: u21:SmartTagType
         4: u22:SmartTagType
         4: u56:SmartTagType
         4: u59:SmartTagType
         4: NextMarker
         4: httpStatusCode
         4: Form
         3: Resource
         3: ListAllMyBucketsResult
         3: Buckets
         3: Ul
         3: u5:SmartTagType
         3: u6:SmartTagType
         3: u13:SmartTagType
         3: u14:SmartTagType
         3: u15:SmartTagType
         3: u16:SmartTagType
         3: u73:SmartTagType
         3: u72:SmartTagType
         3: u74:SmartTagType
         3: toggleSection
         3: Initial-scale=1.0"
         3: Ozelliklerimiz<
         3: BucketName
         3: asp:Label
         3: invalidTag
         3: String
         2: hR
         2: xmpMM:DerivedFrom
         2: QlÃIPq¼I“3ˆJ]ߝ¢*5׏¾¢GC
         2: Wo|¥†ƒ´bFôȉ®D:ýx3¨j8V~ùs¸xÑ,4P[\†ô÷sDóÃ1#ð£y)F?|ù‰
         2: k*H)ϼzЄ’â5˜U%Oýƒ…
         2: Z.¶seiÁ%Ž<Aùƒ¯~õÀZÇv¸„¼ºXBË
                                        är9KÇãào¼KNôT·2 ÛÁcÚFáÌúŒ¾èMJ`—.ôSÕû”UÐÀ¡Õ7»H$2³„Èhe¤þPžEç‰ûP°IBZ)R]
                                                                                                            8R°ÊŒÆ2dd5æ
         2: sÓ?©JhJ¢ÉéëcZzֆAÝ6âd£5èI”å
         2: H©õj&[؇²ê³èsúòšÆýÑ
         2: NYž´Ø&bTɘ9žC¢ˆq¼Vº
         2: JÏS֖!Ðr¼K’“9‰äy´è(J±œÄ¤8xõ ’L&c1DžGa [Yj*Ëf«:þ¹ŠDWW0¨ˆ‡ÜÑ(H™Ôž
         2: W°éÿ„Û=“ek¾6‹¹Æ^Äu
         2: hX¼Wy2ûºfzÐ[*v–Àq°«
         2: KMx¢ÛY75)-¾=¤vw3ê;¿î=-Sç‚\»ÑŒ7Ëu\2J^S[¤‚C&CC-'ððÒ©ä}'‰é¯XLjҒ ðXu•†AÔdd5æ
         2: Rñú9w1÷ñÿ\²þd=¡ùºÞ¿òel¿Ô³¤ø¥ãôrîcï‡cþ¹eüÈp~n·¯ü™[
         2: y+x¬û¤óéI¡*Y1"J+ʼn
         2: Q·öÝm§o}Ý)eóVQú=QNY€$GIËZĀ¥,LÌkÐPXh¾±œÂA–#ÌÿUp$<‘3,$ûG˜ë@Ü9@32e¯C@@Œæ8žt<ÿhàÿy¿ÍŠÏú#´âÞþ
         2: pñçîð†²’Ä²Zóö—èh5„ßϖê¶
         2: i‘!?ƅ%ˆµÀl[N&åoÖHA%$&G
         2: WK¼bøÒ²ÿ³«úO=tãóÒî2ãm%ü[6*
                                      =
         2: ré*Fu,Ü0,ž¯g!j4VÉTKݢÜ4LKgN?ØÎ¤§gѼž«o"(äFö˜Eo¡÷»ã4Hîן·ž@ËNºPê"áG“ӏ5È¡[‰Lq
         2: Zâè:P%ÞвŒXS1Œó<e@ʀK›ôw,*ËË1ҁ•s{¹¥
         2: kXu¦!.\
         2: AùCd»´{×ýÚà"CÁÂ҂0ë÷NG»
         2: o'©SVVp
         2: YƕIeUVë¦X«+M¦oòÖÞEöY{¸²Ö"ïb›)¯Ü‹o´4YZË
         2: Bþ®J¼½+˹»EUÅUððV2¾UiÜc‹K޶PA(P*˜ÑE#
         2: j«Ct³èî֗{]ò”“Z¦Ûk]Ï%%ËzT~…³îËÖÒÁ®Ú×µó÷ybmMqjê©»±²pÐÕBp‚_ß<À}õ)iBËN’Ø(ã4à‰ÉÆêìl¤Ô¤êÛÑR¸VJ¬Ý[ëgHáÍÊë;hNqXbš¢ÖUùM¿&K8ªui›Mžaf¶¼[A{ö"ÐÛÕÝ}Y
                         dës\ajwjêö¾u.¤F¥(CxR²JN
         2: p»Jk<ð§Ÿ=¡+ü㜣h²Ò8ÖäõM¼êú©*V¬ÆZX·qñ+,’”½ÿ5e5xžHö+«ÙöÑX[Æ¿ºú
         2: n£Øbðl•ÄÜÄCilüì]aûÛ÷ª˜Æâb·XZ°%À
                                            jH*
         2: Pmj¼Ï'Þ²dõCUÙz¢ª©ªèJ¦«„@j
         2: G£„øg@]($÷G†TJŒ‰ó V¼§øP}ãà(ˆâ÷iÊ…tñtRŽá\Yt“ÝP*2
         2: Aù›¯våÀXpÙ~%*qà°à
         2: F%{þöéyàO4ôéEs#¢†=ó^G:²…
         2: bgcolor="#FFFFFF"
         2: Valign="middle"
         2: tdOCTYPE
         2: u18:SmartTagType
         2: u19:SmartTagType
         2: u20:SmartTagType
         2: u70:SmartTagType
         2: u71:SmartTagType
         2: AHREF="http:
         2: Address
         2: socketType
         2: i:pgfRef
         2: ServerTime
         2: TraceId
         2: Transition
         2: font-face="Times
         2: Werewolf
         2: Civilian
         2: feTurbulence
         2: feDisplacementMap
         2: Area
         2: Html
         2: headHTML
         2: asp:HyperLink
         2: asp:DropDownList
         2: asp:CheckBox
Once-seen tags: AAddress, AHREF="Collections.html", AHREF="mailto:[email protected]", Basilica, BlockQuote, Bstyle="color:black;background-color:#ffff66", Bucket, CENTEeR, Content-Type:, Endpoint, Footer, GallerySection, Gblockquote, HREF="http:, Hostip, HostipLookupResultSet, Iframe, In, InstragramWidget, Left, Link, Mmeta, Numbertemplate, P,Here, Page, Rosa, Rotten, SCRILongDateAGE="JavaS0riSunday,, SetModTime, Span, Special, TABle, UdeM_menu, UnknownOperationException, Xlink, YX’*~³;]<ú慹ҘÕ^ö¬Ö†Õ'ÿԊwN„JcZo^˜«èk\E£ò⑔»P\éäD~ƒÑ4Þ÷ä97
                                                    «ÆÂ ý, alt="El, aname="OLE_LINK2", aria-label="Instagram", aria-label="YouTube", asp:Button, asp:CustomValidator, asp:LinkButton, asp:Localize, asp:RegularExpressionValidator, asp:ValidationSummary, bDopo, clientConfig, cmLogo, com:Text, contPad, countryAbbrev, countryName, customHeaders, displayName, displayShortName, dnn:DnnCssInclude, emailProvider, fOnt, face="Verdana", font="C60000, f~ÐىSBøhš†_ÝXÄĚÁ÷GV‰~ucš¦ác{3÷Övñ:B¹1°$æ1·¸ž°BÛ?YçÁkUHY6~ðå,€GVðƒ, gml:Null, gml:Point, gml:boundedBy, gml:featureMember, gml:pointProperty, google-site-verification=OgY2mih_AxAZzi7f8b33QTOGHoScolbNOE6aTlqold0, httpProtocol, incomingServer, ipLocation, j‚A|ó+§ðü©cÈå,œ¼²e[WId2ùSÂë#㈧Rxáô¾~æBµÕ™œÞôq–m#ºÃ—׆ÐÒÂ
              §Ž¡¶*€ó7†Qh2Ré, link="#0000FF", mes:Error, mes:ErrorMessage, meta name="description" content="Evo, meta name="keywords" content="Bike, meta property="og:description" content="Rent, navBarCenter, ns0:City, ns0:PlaceName, ns0:PlaceType, ns0:State, onMouseover="over_effect(event,'outset')", outgoingServer, pYes,, psi:ZOTERO_COinS, psi:iktList, psi:sortOptions, quicklinkComp, skippedTag, title="Fairfield, titleB, toggleCont, togglePad, xáP¾~æ$
Found 12,917,481 tags
  of those:
    ALL UPPER: 307,499 (2.38%)
    all lower: 12,388,675 (95.906%)
   Mixed Case: 221,307 (1.713%)

dmsnell avatar Sep 12 '24 20:09 dmsnell

@sirreal want to examine this again and consider it?

I still don’t have strong feelings about it, but the more I do with constructive uses of the HTML API, the more I like having functions like this available for things like serialize_token() and for doing things like closing the stack of open elements.

it’s mostly nice for the SVG/MathML elements requiring mixed case, but convenient to not have to export strtolower() everywhere, and especially when that gets mixed with a conditional stack checking a bunch of attributes about the element to determine if it should be special and mixed-case.

dmsnell avatar Sep 02 '25 18:09 dmsnell

I've been going back and forth. I want to make a coherent decision here.

As implemented, this would print lower case tag names for HTML and MATH elements, but for SVG tags it will print lower or kebab case —e.g. path and foreignObject— as described in the specification.

In trunk, the method roughly corresponds to Node.nodeName for elements (and Element.tagName).

With this change, it corresponds to Element.localName. I noticed that the Chrome devtools use this scheme for printing tag names in the "Elements panel" and discovered it uses localName. I do like that this corresponds to an existing concept of localName and it's not an arbitrary mixed casing decision particular to the HTML API.

I'm not opposed to this, however I'm not sure what benefit there is to printing a few SVG tags with kebab case. They're not treated any differently, and I suspect the vast majority of web developers would lower case the SVG tags as well.

My big question is, why not always use strtolower( $processor->get_tag() ) for printing? It's probably closer to what developers expect.

sirreal avatar Sep 11 '25 10:09 sirreal

what benefit there is to printing a few SVG tags with kebab case … why not always use strtolower( $processor->get_tag() ) for printing?

this probably goes back to XML being case sensitive whereas HTML is not, but here we have something that’s mostly like XML being embedded within HTML. Also, the tag names in the XHTML and MathML namespace are all lowercase, so SVG remains unique.

It makes me wonder if there’s relevance here with safe SVG handling, or export to XML. None of this is particularly decisive, but I tried an experiment with SVG tag names and also with attribute names, thinking along the same lines. Below are the source and a render from Safari of proper and lower casing. We can see that when embedded it has no bearing, but when provided as an external document or when enclosed as a data URI it does impact the render.

I count 39 mixed-case tag names and 53 mixed-case attribute names. I wonder if it would be worth incorporating these. If we really don’t like them here we can put it all inside of serialize_token(), but then again I kind of like that function remaining focused on what it does.

@sirreal ~what if we thought of a new method called something like get_local_name() and get_local_attribute_names_with_prefix() which would mirror the existing two functions, but which wouldn’t auto-complete as the default thing when looking for tag or name? I would love it if we can retain the aspect of the HTML API which is that it pushes for upper-cased tag names in source code (which makes it somewhat easier to identify and search for code working with them).~ On more review this is kind of exactly what the get_qualified_tag_name() and get_qualified_attribute_name() are for, we just aren’t exporting the lower-cased variants for HTML and MathML, which we might expect.

tag name

Screenshot 2025-09-11 at 11 53 41 AM

attribute name

Screenshot 2025-09-11 at 11 11 15 AM

dmsnell avatar Sep 11 '25 17:09 dmsnell

I now think that it could be appropriate here to only lower-case normative HTML elements. Custom elements likely should have their casing preserved. Though I don’t know what to do about unknown HTML elements that are not custom elements, like <nonexistent>. Technically that’s neither a custom element nor an HTML element. It looks like HTML, so maybe lower-case it? (leaving room for expansion of the HTML tag set in the future).

dmsnell avatar Sep 11 '25 17:09 dmsnell

Though I don’t know what to do about unknown HTML elements that are not custom elements, like <nonexistent>. Technically that’s neither a custom element nor an HTML element. It looks like HTML, so maybe lower-case it?

Wouldn't it be an HTML element? (Aren't custom elements also technically HTML elements?)

It would be handled by the "any other start/end tag" rules, the start rule is to:

Insert an HTML element for the token.

If we inspect in the browser, the element's namespaceURI is http://www.w3.org/1999/xhtml.

Unless, of course, it's being parsed in foreign content, in which case the unknown element seems to inherit the namespace.

I think all the namespacing is handled correctly by the HTML Processor so I don't think unknown or custom elements should require special handling.

sirreal avatar Sep 23 '25 16:09 sirreal

Wouldn't it be an HTML element?

Yes of course but I wasn’t talking about namespaces. I was contrasting the fact that we have distinct custom elements which behave differently than the set of tags defined as HTML elements. We could say that something like <nonexistent> is a potential future defined HTML element in the same way that <selectedcontent> was not defined but now is.

So I’m not trying to delineate in which namespace these belong but whether we should be applying the rules for custom elements to them since they are indeed custom.

In writing this it seems strange to do anything other than lowercase them. I don’t know why I thought we should handle them differently.


I’ll get back to this soon and fully review everything that was previously uncertain.

dmsnell avatar Sep 23 '25 18:09 dmsnell