hyperx icon indicating copy to clipboard operation
hyperx copied to clipboard

Multiple self-closing children causes nested children tree

Open slaskis opened this issue 7 years ago • 7 comments

While writing some tests for #33 I found that when using multiple self closing children in a parent it nests them.

In other words this:

<div>
  <div />
  <div />
</div>

Results in a nested tree like this:

<div>
  <div>
    <div></div>
  </div>
</div>

Another find, which is also in the tests, was that the whitelisted self-closing tags (such as <br>) instead causes an exception when written as a open tag, like <br></br>. Which might be correct, but the exception message (Error: multiple root elements must be wrapped in an enclosing tag) was a bit confusing.

slaskis avatar Sep 06 '16 09:09 slaskis

Just so you know, self-closing tags should always have at least one space character between the tag name and the U+002F ("/") character. Not including a space should result in an error according to the specs from W3C. Same for <img />. See documentation on HTML 5 closing tags. Standard self-closing tags are:

  • br
  • hr
  • img
  • input
  • meta

These are never closed, you can use the forward slash / to close them or not.

By the way, <div /> is not a self-closing tag and should throw an error. Problem is jQuery does accept this, going against W3C docs. They have examples on their site of using this bad syntax, encouraging junior developers to using this broken HTML syntax: var div = $('<div/>')

Creates a correct div element, although it is incorrect HTML5.

rbiggs avatar Feb 02 '17 02:02 rbiggs

@rbiggs Ah, good to know!

/cc @dodekeract

jorgebucaran avatar Feb 02 '17 02:02 jorgebucaran

Very interesting @rbiggs! I haven't actually read the spec and was just going by intuition ;)

I do read that spec slightly different though. Regarding start tags it may have one or more space characters unless there are attributes.

And it does only mention self-closing tags for void elements, for which it's optional and "has no effect", and foreign elements (such as svg), for which it is required unless there is an end tag.

As for the normal tags my intuition says that html5 is pretty chill. But you're absolutely right that the validator complains, and it gives me this message:

Self-closing syntax (/>) used on a non-void HTML element. Ignoring the slash and treating as a start tag.

However, the way I've seen this package is less stringified html and more like a "no need to preprocess jsx". And since jsx treats self-closing tags the same as closed tag without children I expected this to work here as well.

slaskis avatar Feb 02 '17 08:02 slaskis

At the moment this results in:

<div>div<div></div></div>

So it's still a problem!

goto-bus-stop avatar Mar 14 '18 14:03 goto-bus-stop

I ran into this issue as well, I'm using hyperx to build a reactive programming framework (evolui) for a few months. I really tried to contribute, but didn't manage to get into the code, so I decided to create a similar package from the ground up. It does pretty much the same thing, but has a focus toward custom components and supports having multiple root elements.

You can find it here https://github.com/gvergnaud/vdom-tag, if anyone is interested.

gvergnaud avatar Apr 30 '18 00:04 gvergnaud

Maybe I'm just confused, but I don't think there's a bug in hyperx here. In a dev console I type:

const d = document.createElement('div')
d.innerHTML = '<div /> <div />'

this is what the console returns for d:

<div>
  <div>
    <div></div>
  </div>
</div>

And that 100% matches what hyperx produces. It seems that hyperx is faithfully applying the same parse logic that the browser runs in this case.

Am I missing something? Where's the actual broken behavior within hyperx?

mreinstein avatar Feb 05 '21 18:02 mreinstein

I've also done a similar check with <br></br>, <br>, and <br/>. In all 3 cases hyperx produces a single BR element as expected the same way the browser does.

I guess I'm just wondering if we're having a discussion about what hyperx should be doing in these cases, or if there is something actually wrong with the parser.

From my perspective, hyperx is supposed to be mirroring exactly what the browser parsers will do with html, rather than doing cleanup. e.g., as humans we know what we're after when we type:

<div>
  <div />
  <div />
</div>

but regardless of whether you feed this html string directly to the browser or hyperx, we don't get the value we might expect.

I think cleaning up malformed html is not something that hyperx should be doing. That could happen as a "cleanup step" before this markup makes it to hyperx.

Assuming I haven't missed actual brokenness, we could probably resolve this issue finally by putting a note near the top of the README.md stating something to the effect that "hyperx intends to faithfully parse html by the same rules that browsers use. Custom cleanup of human error in the formatting is outside the scope of this module."

thoughts?

mreinstein avatar Feb 05 '21 20:02 mreinstein

Given that this issue is stale and I don't believe the hyperx parser is straying from the in-browser parsers are doing, I'm going to close it. Not trying to silence anyone though. If anyone finds issue with the parsing logic, happy to re-open/re-visit.

mreinstein avatar Oct 30 '23 00:10 mreinstein