simple-html-tokenizer icon indicating copy to clipboard operation
simple-html-tokenizer copied to clipboard

Throw syntax errors for invalid EndTags

Open camerondubas opened this issue 5 years ago • 6 comments

This PR is in response to this comment PR https://github.com/glimmerjs/glimmer-vm/pull/982#discussion_r340068648.

Currently Simple HTML Tokenizer allows leading whitespace as well as attributes to be defined in End Tags. The comment linked above suggests that we throw syntax errors in these cases as they are ultimately invalid End Tags

The HTML Spec seems to be a bit inconsistent when it comes to attributes in End Tags, as the 12.1.2.2 End tags section says;

... 4. After the tag name, there may be one or more ASCII whitespace. 5. Finally, end tags must be closed by a U+003E GREATER-THAN SIGN character (>).

However 12.2.5.7 End tag open state says to enter the 12.2.5.8 Tag name state when an the first ASCII Alpha character is encountered. The "tag name state" is not specific to End Tags, and allows for entering the "before attribute name state" and the "self-closing start tag state", both of which aren't really valid in End Tags.


This PR assumes that we want to prevent entering these invalid states and adds syntax errors in the following scenarios:

  • Leading whitespace before the tagname in an EndTag. i.e </ div>
  • Attributes after EndTag tagname. i.e </div foo="bar">
  • Self closing EndTags. i.e </div/>

camerondubas avatar Nov 02 '19 21:11 camerondubas

up

lifeart avatar Dec 21 '19 10:12 lifeart

Ok, after some fighting with git/eol characters on Windows I've got a updated working version.

This will now throw the error 'closing tag must only contain tagname' whenever a closing tag contains trailing or leading whitespace, which also covers attributes in closing tags since they would need to be prepended with a whitespace character.

camerondubas avatar Jan 14 '20 03:01 camerondubas

Bumping this. Hoping to get a review.

I'm happy to close this PR if it isn't relevant anymore. Thanks!

camerondubas avatar Jun 04 '20 22:06 camerondubas

Eeck, sorry @camerondubas, will try to review tomorrow :weary:

rwjblue avatar Jun 05 '20 00:06 rwjblue

Just noting here that this appears to address https://github.com/emberjs/ember.js/issues/19703 and https://github.com/glimmerjs/glimmer-vm/issues/1309.

Edit: don't know what the state is here since its been sitting around for a bit, but if there are things to address I'm happy to help out.

jfdnc avatar Oct 11 '21 17:10 jfdnc

I believe this is still pending a review. @rwjblue any chance this could get looked at again?

camerondubas avatar Oct 20 '21 23:10 camerondubas