html5-php icon indicating copy to clipboard operation
html5-php copied to clipboard

Tables parsing issue - with <tr> inside <tr> case

Open alecpl opened this issue 4 years ago • 4 comments

When working with code generated by Microsoft Outlook I found a case where DOMDocument based parser has no problem with specific code, but HTML5 parser does.

The minimal test case input is this:

<table id="t1">
  <tr>
    <td>
      <table id="t2">
        <tr>
        <tr>
          <td></td>
        </tr>
        </tr>
      </table>
    </td>
  </tr>
  <tr><td></td></tr>
</table>

Note the <tr> element as a child of another <tr>. This causes HTML5 parser to output:

<table id="t1">
  <tr>
    <td>
      <table id="t2">
        <tr></tr>
        <tr>
          <td></td>
        </tr>
      </table>
    </td>
  </tr>
</table>
<tr><td></td></tr>

Which obviously is invalid and causes the parent table to be "closed" before it should, leaving the next (here: last) tr element outside of the table.

Reference: https://github.com/roundcube/roundcubemail/issues/7356

alecpl avatar Feb 07 '21 12:02 alecpl

Since <tr> is not a valid child for <tr>, what would be the suggested solution here? What browsers do?

goetas avatar Feb 07 '21 13:02 goetas

Both Firefox and Chrome convert the t2 table to:

<table id="t2">
    <tbody>
        <tr></tr>
        <tr>
            <td></td>
        </tr>
    </tbody>
</table>

alecpl avatar Feb 07 '21 16:02 alecpl

Sorry, I wasn't clear. The t2 table is the same as in HTML5 output. The difference in the browser is that the outer table is not broken, i.e. the second row is where it should be.

So, the issue here is not the content of the inner table, but that it has impact on the outer table.

alecpl avatar Feb 07 '21 16:02 alecpl

ah, is see, indeed those <tr><td></td></tr> are wrong

goetas avatar Feb 08 '21 19:02 goetas