htmlbeautifier icon indicating copy to clipboard operation
htmlbeautifier copied to clipboard

`Unmatched sequence` error for self-closing tags with newlines

Open mepatterson opened this issue 3 years ago • 6 comments

given this html:

<img
  src="foo.jpg"
/>

... htmlbeautifier will throw an Unmatched sequence error

In console, this can be demonstrated easily with

# FAILS
HtmlBeautifier.beautify("<img\n src='foo.jpg'\n/>")

# this works fine
HtmlBeautifier.beautify("<img\n src='foo.jpg'\n/>".gsub("\n",""))

mepatterson avatar Mar 28 '22 19:03 mepatterson

Note that I'm raising this because htmlbeautifier is used by https://github.com/allmarkedup/lookbook which passes the html straight through, causing a crash

mepatterson avatar Mar 28 '22 19:03 mepatterson

Yes, I also have a similar problem when parsing html like <svg><path\nd="M11/>\n</svg>, removing the first \n resolves the error. After some digging, change this line to [%r{<#{ELEMENT_CONTENT}>}om, seems to work, but I guess the [^/] part is necessary in some corner cases?

Brainor avatar Apr 14 '22 07:04 Brainor

Same problem here, e.g. with an svg this pattern works:

# works
<svg>
  <path all-attributes-on-same-line="true" foo="bar" />
</svg>

but if the attributes are on different lines it breaks:

# doesn't work
<svg>
  <path 
    all-attributes-on-same-line="false"
    foo="bar"
  />
</svg>

Temp fix for me is to write some funky html:

# works, but not great
<svg>
  <path 
    all-attributes-on-same-line="false"
    foo="bar"
  ></path>
</svg>

HermantNET avatar Jul 13 '22 22:07 HermantNET

Has anyone fixed this in their fork? @mepatterson I see that Lookbook extended the parser with a clever hack, I'm going to use that in my fork because it seems prettier-plugin-erb. is also broke and I'd do better fixing the ruby.

aviflombaum avatar Feb 20 '23 17:02 aviflombaum

I ended up here because my svg <path /> was also causing it to crash. However, after investigating further, it does appear that technically this is invalid HTML, because path is not a void element. However, because SVG is a foreign element, path should allow for self-closing tags by an HTML parser:

Void elements only have a start tag; end tags must not be specified for void elements. Foreign elements must either have a start tag and an end tag, or a start tag that is marked as self-closing, in which case they must not have an end tag.

Source: https://html.spec.whatwg.org/multipage/syntax.html#elements-2


The way htmlbeautifier handles this is with a regex to check a list of void elements

https://github.com/threedaymonk/htmlbeautifier/blob/3d75d9b4e09973ede8b886ff129f1e734ccbaa98/lib/htmlbeautifier/html_parser.rb#L8-L11

https://github.com/threedaymonk/htmlbeautifier/blob/3d75d9b4e09973ede8b886ff129f1e734ccbaa98/lib/htmlbeautifier/html_parser.rb#L39-L40

and then fails with an Unmatched sequence if there's no closing tag.

I believe the "correct" way would be to validate that would be in pseudocode

IF the element is self-closed
	IF it is a void element 
	OR it is NOT a built-in HTML element # 👈 adding this for "foreign elements"
		format_self_closed()
    END
END

Or, the easier way, just ignore any self-closing tags, though that would technically produce invalid HTML.

bbugh avatar Mar 08 '24 13:03 bbugh

I also came across the svg issue. I had to remove all inlined svg(s) and load them separately as an image so I don't crash htmlbeautifier.

nickpoorman avatar Jun 26 '24 06:06 nickpoorman