mammoth.js
mammoth.js copied to clipboard
Accessibility issue: tables are not outputting <thead> and <th> tags
In Microsoft Word, you can define if you wish your table to contain a < thead > and first column < th > tags through this toolbar:
However these settings are being ignored by Mammoth when a word document is being processed. Instead it strips all table headings out and simply outputs a table of basic table cells.
For the following table, I used the settings used in the image above.
This is the output that I was expecting Mammoth to output:
<table>
<thead>
<tr>
<th>
<p>Name</p>
</th>
<th>
<p>Number</p>
</th>
<th>
<p>Year</p>
</td>
</tr>
</thead>
<tbody>
<tr>
<th>
<p>Thing</p>
</th>
<td>
<p>123</p>
</td>
<td>
<p>2017</p>
</td>
</tr>
<tr>
<th>
<p>Other thing</p>
</th>
<td>
<p>458</p>
</td>
<td>
<p>2016</p>
</td>
</tr>
</tbody>
</table>
This is the markup I got though:
<table>
<tbody>
<tr>
<td>
<p>Name</p>
</td>
<td>
<p>Number</p>
</td>
<td>
<p>Year</p>
</td>
</tr>
<tr>
<td>
<p>Thing</p>
</td>
<td>
<p>123</p>
</td>
<td>
<p>2017</p>
</td>
</tr>
<tr>
<td>
<p>Other thing</p>
</td>
<td>
<p>458</p>
</td>
<td>
<p>2016</p>
</td>
</tr>
</tbody>
</table>
Mammoth version: v1.4.2 OS: Windows 10 node.js version: 6.9.4
Could you provide a minimal example document?
On Mon, 24 Jul 2017 17:31:20 -0700 Daniel Tonon [email protected] wrote:
In Microsoft Word, you can define if you wish your table to contain a < thead > and first column < th > tags through this toolbar:
However these settings are being ignored by Mammoth when a word document is being processed. Instead it strips all table headings out and simply outputs a table of basic table cells.
For the following table, I used the settings used in the image above.
This is the output that I was expecting Mammoth to output:
<table> <thead> <tr> <th> <p>Name</p> </th> <th> <p>Number</p> </th> <th> <p>Year</p> </td> </tr> </thead> <tbody> <tr> <th> <p>Thing</p> </th> <td> <p>123</p> </td> <td> <p>2017</p> </td> </tr> <tr> <th> <p>Other thing</p> </th> <td> <p>458</p> </td> <td> <p>2016</p> </td> </tr> </tbody> </table>
This is the markup I got though:
<table> <tbody> <tr> <td> <p>Name</p> </td> <td> <p>Number</p> </td> <td> <p>Year</p> </td> </tr> <tr> <td> <p>Thing</p> </td> <td> <p>123</p> </td> <td> <p>2017</p> </td> </tr> <tr> <td> <p>Other thing</p> </td> <td> <p>458</p> </td> <td> <p>2016</p> </td> </tr> </tbody> </table>
Mammoth version: v1.4.2 OS: Windows 10 node.js version: 6.9.4
Here is a minimal example word document: mammoth-table-issue.docx
To support this, it looks like w:tbl/w:tblPr/w:tblLook/@w:firstRow
and w:tbl/w:tblPr/w:tblLook/@w:firstColumn
needs to be read.
It's also worth noting that thead
and th
tags should be created if you mark rows as being repeated header rows.
I'm just wondering, is this bug likely to be fixed by the 1st of September?
My company has a site going live in a few months and it depends on this bug being fixed for it to pass accessibility.
Adding support should be reasonably straightforward, but I'm not sure when I'll get time to work on this (since it's just a side-project). In other words, I wouldn't rely on it.
I'm planning on doing the fix myself as a pull request.
Can you help point me in the right direction so I know where to apply the fix?
There are two main places you'd need to look at. One is the code that parses the document in lib/docx/body-reader.js
. The existing code that handles table headers is probably a good feature to look at for a rough idea of how to implement this. For header rows, you probably want to reuse the same property i.e. isHeader
on table rows, plus add a property to handle header columns. You then need to update the conversion to HTML in lib/document-to-html.js
. Header rows will already be handled by the existing code, but you'd need to add support for header columns.
Each module should be covered by tests. The test directory structure should mirror the directory structure of the code under test, so hopefully they're reasonably straightforward to navigate around. Again, looking for the existing support for table headers is probably a good place to start.
@mwilliamson , I'm facing exactly the same issue on the python implementation. How can I go about getting a fix for it there? Is the JS code comparable that I could migrate it, or would the approach be different?
Thanks! Grant
P.S.: Great library by the way! Thanks for implementing it.
The Python implementation is fairly similar to the JavaScript implementation, but it's worth noting that they (should!) have the same level of support for tables.