tabularray icon indicating copy to clipboard operation
tabularray copied to clipboard

tagpdf support

Open u-fischer opened this issue 3 years ago • 8 comments

We are working on a project to enhance LaTeX so that it can produce tagged pdf. https://www.latex-project.org/news/2020/11/30/tagged-pdf-FS-study/

For a tabular this means that one need to add commands quite similar to html-table commands to cells and rows.

So to successfully tag a tabular, one needs at least

  • places to inject tagging code at the begin and end of a cell and of a row (at the end even if the row it not fully filled)
  • a way to identify header rows and header columns as the code is different there.
  • a way to mark decorative elements like lines as "artifacts".

The code for the cells and rows should at best have access to data like the current row/column number.

It would be nice if tabularray would add suitable hooks for this.

u-fischer avatar Jul 01 '21 07:07 u-fischer

Sorry I know little about these at this time. I have given you write access to this repository. Please feel free to add anything you want.

lvjr avatar Jul 01 '21 10:07 lvjr

Thanks for the invitation. I'm sorry I don't have the time now to think about it, and in the project handling tabulars is for a good reason in a later phase of the project as this is not trivial.

But I think it is important that you consider in your code not only if you get the right visual appearance but also consider how the structure of the table is encoded. This is important if one wants to copy&paste a table or export it to html, or if people want to define layouts in a css-like manner eg as "make all header cells bolder"

u-fischer avatar Jul 01 '21 10:07 u-fischer

Yes, it is useful. I will leave this issue open and hope to come back for it one day.

lvjr avatar Jul 01 '21 10:07 lvjr

Here a very simple example (it needs a current tagpdf 0.9). It marks up a table with one column which has a header and two rows. I think it gives an impression of the code we need to inject (it is even more as I left out a few details like attributes).

If you compile this and then upload the pdf at https://ngpdf.com/loadFile you can check the html and it will give something like this

<!DOCTYPE html>
<html><head>
<title>test-utf8</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
</head>
<body lang="en-US">
 <div data-pdf-se-type="Document">
  <table data-pdf-se-type="Table">
   <thead data-pdf-se-type="THead">
    <tr data-pdf-se-type="TR">
     <th data-pdf-se-type="TH">Header</th>
    </tr>
   </thead>
   <tr data-pdf-se-type="TR">
    <td data-pdf-se-type="TD">row1</td>
   </tr>
   <tr data-pdf-se-type="TR">
    <td data-pdf-se-type="TD">row1</td>
   </tr>
  </table>
 </div>
</body></html>
\RequirePackage{pdfmanagement-testphase}
\DeclareDocumentMetadata{uncompress}
\documentclass{article}
\usepackage{tagpdf,array}
\tagpdfsetup{activate}

\begin{document}

\tagstructbegin{tag=Table}
\begin{tabular}{l}
\tagstructbegin{tag=THead}%
\tagstructbegin{tag=TR}%
\tagstructbegin{tag=TH}%
\tagmcbegin{tag=TH}%
Header
\tagmcend
\tagstructend
\tagstructend
\tagstructend
\\
\tagstructbegin{tag=TR}%
\tagstructbegin{tag=TD}%
\tagmcbegin{tag=TD}%
row1
\tagmcend
\tagstructend
\tagstructend
\\
\tagstructbegin{tag=TR}%
\tagstructbegin{tag=TD}%
\tagmcbegin{tag=TD}%
row2
\tagmcend
\tagstructend
\tagstructend
\end{tabular}
\tagstructend

\end{document}

u-fischer avatar Jul 02 '21 16:07 u-fischer

Yes, it is very interesting.

lvjr avatar Jul 03 '21 02:07 lvjr

I will close this issue and further comments could be leaved in issue #197.

lvjr avatar Nov 30 '22 14:11 lvjr

I decide to reopen this issue to record experiments with tagpdf here.

lvjr avatar Feb 11 '23 01:02 lvjr

With the newly added public hooks and variables (#197) in trial/tabularray.sty, now we can correctly tag <table>, <tr> and <td> in the above commit.

image

<!DOCTYPE html>
<html><head>
<title>test-tagpdf-01</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body lang="en-US">
 <div data-pdf-se-type="Document" id="ID.001">
  <p data-pdf-se-type="P" id="ID.002"><span id="page-0" role="doc-pagebreak"></span>Some text.</p>
  <table data-pdf-se-type="Table" id="ID.003">
   <tbody><tr data-pdf-se-type="TR" id="ID.004">
    <td data-pdf-se-type="TD" id="ID.005"><p data-pdf-se-type="P" id="ID.006">Alpha</p></td>
    <td data-pdf-se-type="TD" id="ID.007"><p data-pdf-se-type="P" id="ID.008">Beta</p></td>
    <td data-pdf-se-type="TD" id="ID.009"><p data-pdf-se-type="P" id="ID.010">Gamma</p></td>
    <td data-pdf-se-type="TD" id="ID.011"><p data-pdf-se-type="P" id="ID.012">Delta</p></td>
   </tr>
   <tr data-pdf-se-type="TR" id="ID.013">
    <td data-pdf-se-type="TD" id="ID.014"><p data-pdf-se-type="P" id="ID.015">Epsilon</p></td>
    <td data-pdf-se-type="TD" id="ID.016"><p data-pdf-se-type="P" id="ID.017">Zeta</p></td>
    <td data-pdf-se-type="TD" id="ID.018"><p data-pdf-se-type="P" id="ID.019">Eta</p></td>
    <td data-pdf-se-type="TD" id="ID.020"><p data-pdf-se-type="P" id="ID.021">Theta</p></td>
   </tr>
   <tr data-pdf-se-type="TR" id="ID.022">
    <td data-pdf-se-type="TD" id="ID.023"><p data-pdf-se-type="P" id="ID.024">Iota</p></td>
    <td data-pdf-se-type="TD" id="ID.025"><p data-pdf-se-type="P" id="ID.026">Kappa</p></td>
    <td data-pdf-se-type="TD" id="ID.027"><p data-pdf-se-type="P" id="ID.028">Lambda</p></td>
    <td data-pdf-se-type="TD" id="ID.029"><p data-pdf-se-type="P" id="ID.030">Mu</p></td>
   </tr>
  </tbody></table>
  <p data-pdf-se-type="P" id="ID.031">More text.</p>
 </div>
</body></html>

lvjr avatar Feb 11 '23 03:02 lvjr