ckeditor5 icon indicating copy to clipboard operation
ckeditor5 copied to clipboard

CKEditor 5 Mangles Table Structure and Makes Tables Inaccessible / Support for Tables with Multi-Level Headers / Allow <tbody> <th> in any column

Open jameslpetersen opened this issue 2 years ago • 19 comments

šŸ“ Provide detailed reproduction steps (if any)

  1. Paste the HTML from tables 1, 2, 3, and 4 (below) into source view of the reference implementation of CKEditor.
  2. Switch out of source view and back into it.

Table 1

This table has a rowgroup header with a colspan.

<table>
    <thead>
    <tr>
        <th scope="col">Col Header 1</th>
        <th scope="col">Col Header 2</th>
        <th scope="col">Col Header 3</th>
    </tr>
    </thead>
    <tbody>
    <tr>
        <th scope="rowgroup" colspan="3">Rowgroup Header 1</th>
    </tr>
    <tr>
        <th scope="row">Row Header 1</th>
        <td>Data 1,1</td>
        <td>Data 2,1</td>
    </tr>
    </tbody>
</table>

Table 2

This table has two rowgroup headers that have colspans.

<table>
    <thead>
    <tr>
        <th scope="col">Col Header 1</th>
        <th scope="col">Col Header 2</th>
        <th scope="col">Col Header 3</th>
    </tr>
    </thead>
    <tbody>
    <tr>
        <th scope="rowgroup" colspan="3">Rowgroup Header 1</th>
    </tr>
    <tr>
        <th scope="row">Row Header 1</th>
        <td>Data 1,1</td>
        <td>Data 2,1</td>
    </tr>
    </tbody>
    <tbody>
    <tr>
        <th scope="rowgroup" colspan="3">Rowgroup Header 2</th>
    </tr>
    <tr>
        <th scope="row">Row Header 2</th>
        <td>Data 1,2</td>
        <td>Data 2,2</td>
    </tr>
    <tr>
        <th scope="row">Row Header 3</th>
        <td>Data 1,3</td>
        <td>Data 2,3</td>
    </tr>
    </tbody>
</table>

Table 3

Similar to previous examples, but we've removed the colspan on the rowgroup and replaced with empty td elements. We've also added some row headers to the second column and a tbody block that doesn't have any th with a scope set to rowgroup.

<table>
    <thead>
    <tr>
        <th scope="col">Col Header 1</th>
        <th scope="col">Col Header 2</th>
        <th scope="col">Col Header 3</th>
    </tr>
    </thead>
    <tbody>
    <tr>
        <th scope="rowgroup">Rowgroup Header 1</th>
        <td></td>
        <td></td>
    </tr>
    <tr>
        <th scope="row">Row Header 1</th>
        <th scope="row">Row Subheader 1</th>
        <td>Data 1,1</td>
    </tr>
    </tbody>
    <tbody>
    <tr>
        <th scope="row">Row Header 2</th>
        <th scope="row">Row Subheader 2</th>
        <td>Data 1,2</td>
    </tr>
    <tr>
        <th scope="row">Row Header 3</th>
        <th scope="row">Row Subheader 3</th>
        <td>Data 1,3</td>
    </tr>
    </tbody>
    <tbody>
    <tr>
        <th scope="rowgroup">Rowgroup Header 2</th>
        <td></td>
        <td></td>
    </tr>
    <tr>
        <th scope="row">Row Header 4</th>
        <th scope="row">Row Subheader 4</th>
        <td>Data 1,4</td>
    </tr>
    </tbody>
</table>

Table 4

This table is an example from the HTML spec.

<table>
    <colgroup> <col>
    <colgroup> <col> <col> <col>
    <thead>
    <tr> <th> <th>2008 <th>2007 <th>2006
    <tbody>
    <tr> <th scope=rowgroup> Research and development
        <td> $ 1,109 <td> $ 782 <td> $ 712
    <tr> <th scope=row> Percentage of net sales
        <td> 3.4% <td> 3.3% <td> 3.7%
    <tbody>
    <tr> <th scope=rowgroup> Selling, general, and administrative
        <td> $ 3,761 <td> $ 2,963 <td> $ 2,433
    <tr> <th scope=row> Percentage of net sales
        <td> 11.6% <td> 12.3% <td> 12.6%
</table>

āœ”ļø Expected result

CKEditor does not alter the table markup.

āŒ Actual result

Table 1

CKEditor has moved the parent row for "Rowgroup Header 1" into the thead element, which makes the table inaccessible to screen readers.

<table>
    <thead>
    <tr>
        <th scope="col">
            Col Header 1
        </th>
        <th scope="col">
            Col Header 2
        </th>
        <th scope="col">
            Col Header 3
        </th>
    </tr>
    <tr>
        <th colspan="3" scope="rowgroup">
            Rowgroup Header 1
        </th>
    </tr>
    </thead>
    <tbody>
    <tr>
        <th scope="row">
            Row Header 1
        </th>
        <td>
            Data 1,1
        </td>
        <td>
            Data 2,1
        </td>
    </tr>
    </tbody>
</table>

Table 2

CKEditor has moved all the parent rows for rowgroup headers into the thead element. It has also put all the rows after the thead block into one tbody block. This makes the table inaccessible to screen readers and is a loss of data, since we can no longer reconstruct the original table due to loss of the tbody blocks.

<table>
    <thead>
    <tr>
        <th scope="col">
            Col Header 1
        </th>
        <th scope="col">
            Col Header 2
        </th>
        <th scope="col">
            Col Header 3
        </th>
    </tr>
    <tr>
        <th colspan="3" scope="rowgroup">
            Rowgroup Header 1
        </th>
    </tr>
    <tr>
        <th colspan="3" scope="rowgroup">
            Rowgroup Header 2
        </th>
    </tr>
    </thead>
    <tbody>
    <tr>
        <th scope="row">
            Row Header 1
        </th>
        <td>
            Data 1,1
        </td>
        <td>
            Data 2,1
        </td>
    </tr>
    <tr>
        <th scope="row">
            Row Header 2
        </th>
        <td>
            Data 1,2
        </td>
        <td>
            Data 2,2
        </td>
    </tr>
    <tr>
        <th scope="row">
            Row Header 3
        </th>
        <td>
            Data 1,3
        </td>
        <td>
            Data 2,3
        </td>
    </tr>
    </tbody>
</table>

Table 3

CKEditor has not moved all the parent rows for rowgroup headers into the thead element, but has put all the rows after the thead block into one tbody block. This causes data loss, since we can no longer reconstruct the original table due to loss of the tbody blocks, and makes the table inaccessible to screen readers. CKEditor has also turned row headers in the second column to td elements with a scope attribute, which is invalid, and causes further accessibility issues.

<table>
    <thead>
    <tr>
        <th scope="col">
            Col Header 1
        </th>
        <th scope="col">
            Col Header 2
        </th>
        <th scope="col">
            Col Header 3
        </th>
    </tr>
    </thead>
    <tbody>
    <tr>
        <th scope="rowgroup">
            Rowgroup Header 1
        </th>
        <td>
            &nbsp;
        </td>
        <td>
            &nbsp;
        </td>
    </tr>
    <tr>
        <th scope="row">
            Row Header 1
        </th>
        <td scope="row">
            Row Subheader 1
        </td>
        <td>
            Data 1,1
        </td>
    </tr>
    <tr>
        <th scope="row">
            Row Header 2
        </th>
        <td scope="row">
            Row Subheader 2
        </td>
        <td>
            Data 1,2
        </td>
    </tr>
    <tr>
        <th scope="row">
            Row Header 3
        </th>
        <td scope="row">
            Row Subheader 3
        </td>
        <td>
            Data 1,3
        </td>
    </tr>
    <tr>
        <th scope="rowgroup">
            Rowgroup Header 2
        </th>
        <td>
            &nbsp;
        </td>
        <td>
            &nbsp;
        </td>
    </tr>
    <tr>
        <th scope="row">
            Row Header 4
        </th>
        <td scope="row">
            Row Subheader 4
        </td>
        <td>
            Data 1,4
        </td>
    </tr>
    </tbody>
</table>

Table 4

CKEditor has added inline styles to col elements (strange), put all the rows after the thead block into one tbody block (again, potential data loss and makes the table inaccessible to screen readers). I'm not sure what the practical implications of this next quirk are, but I want to note it. CKEditor has added a non-breaking space character to an empty th tag. I don't think non-breaking spaces are ASCII whitespace. This means that the header cell is non-empty and user agents (including screen readers) should add it to the header list for the cells beneath it, according to the spec.

<table class="ck-table-resized">
    <colgroup><col style="width:25%;"><col style="width:25%;"><col style="width:25%;"><col style="width:25%;"></colgroup><colgroup><col><col><col></colgroup>
    <thead>
    <tr>
        <th>
            &nbsp;
        </th>
        <th>
            2008
        </th>
        <th>
            2007
        </th>
        <th>
            2006
        </th>
    </tr>
    </thead>
    <tbody>
    <tr>
        <th scope="rowgroup">
            Research and development
        </th>
        <td>
            $ 1,109
        </td>
        <td>
            $ 782
        </td>
        <td>
            $ 712
        </td>
    </tr>
    <tr>
        <th scope="row">
            Percentage of net sales
        </th>
        <td>
            3.4%
        </td>
        <td>
            3.3%
        </td>
        <td>
            3.7%
        </td>
    </tr>
    <tr>
        <th scope="rowgroup">
            Selling, general, and administrative
        </th>
        <td>
            $ 3,761
        </td>
        <td>
            $ 2,963
        </td>
        <td>
            $ 2,433
        </td>
    </tr>
    <tr>
        <th scope="row">
            Percentage of net sales
        </th>
        <td>
            11.6%
        </td>
        <td>
            12.3%
        </td>
        <td>
            12.6%
        </td>
    </tr>
    </tbody>
</table>

ā“ Possible solution

Add some kind of data attribute that tells CKEditor to not parse the table. Or just follow the table parsing algorithm from the HTML5 spec.

šŸ“ƒ Other details

  • Browser: Any
  • OS: Any
  • First affected CKEditor version: Unknown, but this is happening in the reference implementation.
  • Installed CKEditor plugins: None

If you'd like to see this fixed sooner, add a šŸ‘ reaction to this post.

jameslpetersen avatar Aug 31 '23 15:08 jameslpetersen

Thank you for the report, we will analyze it. One comment, data change is not data loss, as far as I can see, we don't have in the examples the data loss where the data vanishes. We need to standardize the incoming HTML to a common format to make some features work, like colgroup for column resize.

Witoso avatar Sep 11 '23 07:09 Witoso

Thanks for the response, Witek.

My main reason for mentioning data loss is that this bug seems to merge multiple tbody tags into a single tbody tag. Depending on the source code of the original table, it may be impossible to determine the original number and/or location of those tbody tags. And if the table has one or more th tags whose scope attribute equals rowgroup, then the tbody tags convey semantic information about the table to users of screen readers. So, the bug may erase semantic data about a table that is necessary for screen reader users to properly interpret the table.

For example, given the following table, the loss of the second tbody tag prevents you from determining how many of the final rows were in the first tbody tag (or even if there was a second tbody tag). And the loss of the second tbody tag changes the meaning of the table for screen reader users.

<table>
    <thead>
    <tr>
        <th scope="col">Col 1</th>
        <th scope="col">Col 2</th>
    </tr>
    </thead>
    <tbody>
    <tr>
        <th scope="rowgroup">Rowgroup Header</th>
        <td></td>
    </tr>
    <tr>
        <th scope="row">Row Header 1</th>
        <td>Data 1</td>
    </tr>
    <tr>
        <th scope="row">Row Header 2</th>
        <td>Data 2</td>
    </tr>
    </tbody>
    <tbody>
    <tr>
        <th scope="row">Row Header 3</th>
        <td>Data 3</td>
    </tr>
    <tr>
        <th scope="row">Row Header 4</th>
        <td>Data 4</td>
    </tr>
    </tbody>
</table>

jameslpetersen avatar Sep 11 '23 14:09 jameslpetersen

Tracking this on drupal.org at https://www.drupal.org/project/drupal/issues/3384400.

wimleers avatar Nov 02 '23 10:11 wimleers

We have a client with over 1000 complex data tables that would all suffer what I would consider data loss as a result of this bug(s) / lack of full support for HTML tables.

To borrow the quote on the related Drupal.org issue from @phil-s

I'd agree that some HTML markup could be classed as metadata for most intents and purposes, but table markup is so integral to the semantic meaning of that content that I don't think you can consider it as anything other than "data".

There are a couple different issues detailed here, while investigating I wanted to explore a potential avenue specifically for the issue of tbody elements being condensed into 1.

I've added a draft PR with a PoC - the idea is to introduce a rowGroup attribute to the tableRow model.

I've added a manual test with the markup from the table 3 example - while it gives me the expected output for tbody elements, I haven't looked into the td / th issue.

Its draft (no doc changes or automated tests at this stage) but I'm very keen to get feedback on the approach before I go further as I'm not experienced with the CKEditor 5 framework / ecosystem so there's probably more elegant approaches.

For td / th conversion - there are a lot more moving parts here, I was thinking if a role attribute of heading || data could be added to the tableCell model - then when upcasting the original HTML can dictate if this is a heading or not. There seem to be lots of moving parts here that would be impacted, but I think the current logic of th being based on an entire heading row / heading column is problematic.

As it stands this issue is a blocker for use to adopt CKEditor5 so keen to help find solutions where we can.

Side note - your contribution docs are awesome - thank you for those!

ericgsmith avatar Nov 16 '23 22:11 ericgsmith

@niegowski could you take a look when you have time?

Witoso avatar Nov 17 '23 08:11 Witoso

I'd just like to add that while the examples provided are about CKEditor mishandling existing markup, it seems to me the CKEditor doesn't even create its own tables with the basic markup required for accessibility. For example, if a table has two headers in two different directions, like a header row as well as a header column, accessibility guidelines require the "scope" attribute in order to properly determine the direction of the headers, but CKEditor does not do this: https://www.w3.org/WAI/tutorials/tables/two-headers

chrisgross avatar Jan 08 '24 21:01 chrisgross

The table bugs here are really critical. I’m responding to an RFP for a city government website proposal, and one question completely disqualifies Drupal 10 with CKE5 as inconformant:

To comply with the U.S. Access Board Web-based Intranet and Internet Information and Applications (1194.22) provisions, the solution provides the following:

  • The solution allows markup to be used to associate data cells and header cells for data tables that have two or more logical levels of row or column headers.

If you copy/paste the table example from Curriculum for Web Content Accessibility Guidelines 1.0 (created in year 2000!) into the CKEditor 5 demo page (via the Source mode button) it is broken.

It appears that several things are happening:

  • multiple <tbody> sections are collapsed into a single one.
  • any <tr> in <tbody> consisting of only <th> tags (no <td> tags) are considered by CKEditor to be a header, and thus pulled out of order up into the <thead> section.
Screen Shot 2024-02-13 at 6 17 55 PM Screen Shot 2024-02-13 at 6 18 05 PM

jameswilson avatar Feb 13 '24 23:02 jameswilson

Regarding this one in the previous comment:

any <tr> in <tbody> consisting of only <th> tags (no <td> tags) are considered by CKEditor to be a header, and thus pulled out of order up into the <thead> section.

It is worth clarifying that this has not been called out as a distinct issue from the "rowgroup" bug mentioned in the issue summary above. They're related for sure, but this issue where rows from table body get pulled up to thead happens with a row containing only th tags (one or more) and NO td tags, irrespective of whether "rowgroup" attr is present or not.

jameswilson avatar Feb 14 '24 16:02 jameswilson

I'm late to this conversation, but wanted to chime in that we have a vendor who recently upgraded to CKEditor 5, and this has mangled all our tables as well. On a separate note, I wish CKEditor 5 wouldn't wrap the table in an auto-padded element, since this prevents us from setting our tables to 100% width.

Thank you for the report, we will analyze it. One comment, data change is not data loss, as far as I can see, we don't have in the examples the data loss where the data vanishes. We need to standardize the incoming HTML to a common format to make some features work, like colgroup for column resize.

Data change can absolutely be data loss. If someone shuffled all the primary IDs in your SQL table, would you not consider that data loss?

dliwustl avatar Feb 29 '24 16:02 dliwustl

@dliwustl

Data change can absolutely be data loss. If someone shuffled all the primary IDs in your SQL table, would you not consider that data loss?

I'm gonna remember this quote! šŸ˜„

wimleers avatar Mar 01 '24 07:03 wimleers

šŸŽ šŸ‘‰ šŸŠ 😊

On a separate note, I wish CKEditor 5 wouldn't wrap the table in an auto-padded element, since this prevents us from setting our tables to 100% width.

Just to make sure, you mean:

.ck-content .table {
    margin: 0.9em auto;
    display: table;
}

AFAIK, this could be easily configurable via changing CSS, and table.tableProperties.defaultProperties

Witoso avatar Mar 01 '24 08:03 Witoso

JFYI, on the ck/14911-investigation branch there's a prepared manual test with all provided examples and live editors.

Witoso avatar Mar 01 '24 11:03 Witoso

On a separate note, I wish CKEditor 5 wouldn't wrap the table in an auto-padded element, since this prevents us from setting our tables to 100% width.

Just to make sure, you mean:

.ck-content .table {
    margin: 0.9em auto;
    display: table;
}

AFAIK, this could be easily configurable via changing CSS, and table.tableProperties.defaultProperties

Thanks for the reply! This is related to a vendor's implementation, so I'll pass this along. I had found and suggested editing that style before, but the link is very helpful. 😊

Might I ask one more related question (and then I promise to stop hijacking)? In the vendor's implementation, the table selection handle still appears when the editor element is in read-only mode. It appears when the page first loads, then disappears if you select the table and unselect it. Is that working as intended? The selection handle is useful in edit mode, but not in read-only mode.

dliwustl avatar Mar 01 '24 22:03 dliwustl

I'd just like to add that while the examples provided are about CKEditor mishandling existing markup, it seems to me the CKEditor doesn't even create its own tables with the basic markup required for accessibility. For example, if a table has two headers in two different directions, like a header row as well as a header column, accessibility guidelines require the "scope" attribute in order to properly determine the direction of the headers, but CKEditor does not do this: https://www.w3.org/WAI/tutorials/tables/two-headers

There's a separate issue (opened in 2018!) about this: #3175

A-Kun avatar Apr 17 '24 02:04 A-Kun

Table 1, Table 2 issues seems to be related to these issues:

https://github.com/ckeditor/ckeditor5/issues/3172#issuecomment-430926553 https://github.com/ckeditor/ckeditor5/issues/13609

... and this change:

https://github.com/ckeditor/ckeditor5/pull/13600/files#diff-828bca0c515d3787c953019584be105f220d6e4fa4b583f00fe0961d6f3462d9R255

It looks like scope attribute is ignored.

Mati365 avatar Jul 15 '24 11:07 Mati365

One related problem we found: If any row has a <th> in the first column, then all first-column cells in all other rows are converted to <th>. If any such cell used to be <td colspan="...">, spanning the full table width, the above bug is triggered, and that row moves into <thead>.

The solution for all of this should be:

  • The data model should allow for all kinds of tables which have their <td> and <th> in various places, as long as it is valid html.
  • Any "clean-up" should be opt-in.

donquixote avatar Jan 20 '25 18:01 donquixote

I agree with @donquixote cleanup should be "optional", should be an opt-in not an opt-out. By default cleanup should be disabled!

olstjos avatar Apr 01 '25 16:04 olstjos

ok turns out we were able to disable the insertTable plugin to help reduce the table mangling issue, will look into this further

Image

olstjos avatar Apr 02 '25 02:04 olstjos

I've revisited this for a client that is still having to use workaround to avoid significant data loss and issues with the table plugin.

I would be really helpful to get an understanding from CKEditor team where this sits in terms of a priority or likelihood of happening. It feels like while this has been the most important upstream issue on Drupal's prioritized list for a long time, outside of the manual tests added its hard to gauge here how the CKsource team view this issue.

This feels like it could end up being a fairly substantial change to get right - so at this point its really about getting an understanding of if the answer is finding better workarounds, or holding out hope (and availability) to help get this fixed in CKEditor.

From my perspective, this is not an unsolvable issue but I can appreciate the complexity of it.

I'll also highlight that the general html support / source editing plugin doesn't have these issues - these issues are due to the model of the table plugin itself, and those model decisions were likely taken to make other functionality easy. And our clients want the best of both worlds, we want the ease of using the table plugin UI when adding new rows and cells to tables, but we also need to support the ability to manually edit the table markup to improve the accessibility and semantics.

From what I can see:

  • The table plugin model works with the concept of table, tableRow and tableCell elements
  • td and th elements are both upcast to tableCell which is where a lot of semantic data is lost
  • tableCell is downcast to either td or th based on an attribute of headingRows or headingColumns on the table - which is an incorrect and limited view on how tables can be structured
  • There is no internal model for thead, tfoot or tbody elements when upcasting so more semantic data is lost here
  • thead is assumed to contain all the "header rows" when downcasting and this is where rows can get rearranged and placed in the wrong groups
  • tbody is assumed to a singular instance when downcast and gets the leftovers / non heading rows which is why they are condensed into a single group

There is so much functionality in the table plugin that depends on the current internal model - so from the outside I do appreciate the complexity here.

I made a PoC a while ago https://github.com/ckeditor/ckeditor5/pull/15366/files which attempted to see if this could be resolved by introducing a rowGroup property on the tableRow and a role property on the tableCell - that was so long ago I can't even remember a lot of the thinking - but I guess what I was trying to illustrate was there might be ways we can "ease" into supporting this markup without going down the route of needing massive overhaul of the model. But I don't think this issue can really progress without some design input into the model changes before rushing into coding.

ericgsmith avatar Jul 09 '25 21:07 ericgsmith