pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

Support tabularray as table format in LaTeX output

Open lvjr opened this issue 3 years ago • 20 comments

I have written a modern LaTeX3 pacakge tabularray for typesetting tabulars and arrays. And it would be nice if pandoc could support this package.

Similar to HTML+CSS, with this package, you can completely separate the styles from the contens of tables, and the styles of tables can be completely set in keyval way. Therefore it is easier to write converter for it.

image image

Also, it is feature complete. It has builtin supports for table colors, dash lines, multiline cells, rowspan/colspan, X columns in tabularx and tabu packages, three part tables, long tables. The long tables can work in two column documents. And you can easily change a short table to long table by just adding a long option. ( The rowhead and rowfoot in the following example are similar to thead and tfoot in HTML.)

\begin{tblr}[long]{
 rowhead = 2, rowfoot = 1,
}
 Head    & Head  & Head    \\
 Head    & Head  & Head    \\
 Alpha   & Beta  & Gamma   \\
 Epsilon & Zeta  & Eta    \\
 Iota    & Kappa & Lambda \\
 Nu      & Xi    & Omicron \\
 Rho     & Sigma & Tau     \\
 Phi     & Chi   & Psi     \\
 Foot    & Foot  & Foot    \\
\end{tblr}

At last, there are h and f columns for vertically aligning cell text at row top and bottom, which most of LaTeX table packages don't support, but HTML/CSS and Microsoft Word natively support.

image

lvjr avatar Aug 05 '21 02:08 lvjr

This package makes it much easier for users to customise the format of tables, globally or individually. And I think it might be easier for Pandoc as well.

For example, suppose we have the following table in markdown:

  Right     Left     Center     Default
-------     ------ ----------   -------
     12     12        12            12
    123     123       123          123
      1     1          1             1

For the above table, Pandoc currently generates the LaTeX code shown below.

\begin{longtable}[]{@{}rlcl@{}}
\toprule()
Right & Left & Center & Default \\
\midrule()
\endhead
12 & 12 & 12 & 12 \\
123 & 123 & 123 & 123 \\
1 & 1 & 1 & 1 \\
\bottomrule()
\end{longtable}

If instead we use tabularray, the LaTeX code that Pandoc generates would be much simpler.

\begin{tblr}[]{@{}rlcl@{}}
Right & Left & Center & Default \\
12 & 12 & 12 & 12 \\
123 & 123 & 123 & 123 \\
1 & 1 & 1 & 1 \\
\end{tblr}

Also, somewhere in the header Pandoc would put something like this, to set the default table format.

\SetTblrInner{
  hline{1} = {1pt, solid},
  hline{2} = {solid},
  hline{Z} = {solid},
}

The really cool thing is that users could change the formatting of each table by inserting the appropriate formatting command as raw LaTeX. You would still use Markdown syntax for the table itself. For example, this table would have borders around all cells.

\SetTblrInner{
  hlines, vlines,
}

  Right     Left     Center     Default
-------     ------ ----------   -------
     12     12        12            12
    123     123       123          123
      1     1          1             1

mhwombat avatar Jan 28 '23 23:01 mhwombat

@lvjr / @mhwombat — what, if any, are the downsides to this package? Would it just be the requirement of a modern TeX install?

iandol avatar Jan 29 '23 10:01 iandol

@lvjr / @mhwombat — what, if any, are the downsides to this package? Would it just be the requirement of a modern TeX install?

@iandol tabularray is slower than longtable, but it only needs one compilation to get the result. It requires at least TeXLive 2020.

lvjr avatar Jan 29 '23 11:01 lvjr

I think it would be best to have a Pandoc command-line option / defaults file entry to choose between the table packages. For academic and traditional publishing, booktabs is still the best choice, and I would make it the default setting. It enforces good design principles, and it's simple to use (because a lot of decisions are made on your behalf).

In cases where you need more precise control (perhaps to apply corporate brand design standards, or just for creativity), tabularray would be the better option.

mhwombat avatar Jan 29 '23 12:01 mhwombat

I would love to see this functionality - tabularray is great for customisation of table output - the attached example demonstrates how tabularray creates a well formatted long table in 2 column mode - to my knowledge this would be impossible using the pandoc default of longtable

test2_1.pdf image

jgunstone avatar May 12 '23 16:05 jgunstone

Really hope it will be supported soon.

zwz avatar May 17 '23 09:05 zwz

@lvjr have you written a lua filter doing the transformation from Pandoc table to LaTeX? I would be interested if so. Thanks

fsoedjede avatar Jul 20 '23 16:07 fsoedjede

EDIT: ⚠️ as far as I can tell this should work for versions 2.x.x (though it's only been tested on 2.9.2.1) for versions 3.x.x see my updated version below

@lvjr have you written a lua filter doing the transformation from Pandoc table to LaTeX? I would be interested if so. Thanks

You don't actually need a filter. Just add the following to the preamble of your latex template

$if(tables)$
\usepackage{tabularray}
\let\longtable\longtblr
\let\endlongtable\endlongtblr
\let\endhead\empty
\UseTblrLibrary{booktabs}
\NewTblrTheme{headless}{
    \DefTblrTemplate{contfoot-text}{default}{}
    \DefTblrTemplate{conthead-text}{default}{}
    \DefTblrTemplate{caption}{default}{}
    \DefTblrTemplate{conthead}{default}{}
    \DefTblrTemplate{capcont}{default}{}
}
\def\tabularnewline{\\}
\SetTblrOuter[longtblr]{
    expand=\tabularnewline,
    entry=none,
    label=none,
    theme=headless,
}
$endif$

This aliases pandocs preferred longtable environment to longtblr from tabularray.

It has to use the booktabs library to provide \toprule, \midrule, and \bottomrule. I'd prefer to define such rules using \SetTblrInner but unfortunate latex has no easy way to remove \bottomrule that leaves no token behind. Something like \let\bottomrule\empty won't work because the left over \empty appends an additional empty row to each table.

Applying the headless theme removes the caption that normally comes with longtblr.

The main drawbacks of this are:

  1. The template still doesn't get do decide if there should be a bottom rule (although one could probably try to set it to 0 width)
  2. This won't handle multi page tables with captions and labels
  3. Always using longtblr allows page page-breaks in tables with very view rows.
  4. EDIT: This won't handle table captions (in fact it will crash if there are any table captions in the document). Also see this comment below

Here is an example of how to apply styling to longtblr (as opposed to tblr). This should highlight headers, use zebra colors, and make all columns variable width.

\UseTblrLibrary{varwidth}
\SetTblrInner[longtblr]{
  row{1}   = {gray!20,valign=h},
  row{Z}   = {valign=f},
  row{odd} = {gray!5},
  hspan    = minimal,
  columns  = {co=1,valign=t},
}

JakeI avatar Jul 22 '23 21:07 JakeI

I just discovered tabularray and hope that Pandoc supports it by default. Best.

maikol-solis avatar Oct 10 '23 14:10 maikol-solis

Same here, tabularray seems to solve lots of table-related issues, e.g. long-living ones like #1023

jankap avatar Nov 27 '23 18:11 jankap

@lvjr have you written a lua filter doing the transformation from Pandoc table to LaTeX? I would be interested if so. Thanks

You don't actually need a filter. Just add the following to the preamble of your latex template

$if(tables)$
\usepackage{tabularray}
\let\longtable\longtblr
\let\endlongtable\endlongtblr
\let\endhead\empty
\UseTblrLibrary{booktabs}
\NewTblrTheme{headless}{
    \DefTblrTemplate{contfoot-text}{default}{}
    \DefTblrTemplate{conthead-text}{default}{}
    \DefTblrTemplate{caption}{default}{}
    \DefTblrTemplate{conthead}{default}{}
    \DefTblrTemplate{capcont}{default}{}
}
\def\tabularnewline{\\}
\SetTblrOuter[longtblr]{
    expand=\tabularnewline,
    entry=none,
    label=none,
    theme=headless,
}
$endif$

@JakeI nice snipped, thanks!

I tried to add this to the preamble, but the ifs seem not to be allowed there. I removed them - not sure if that's correct or not.

However, the aliasing seem to work, but Pandoc adds some \noalign commands:

| test | haha |
| ---- | ---- |
| 1    | 1    |
| 2    | 3    |

results in

\begin{longtable}[]{@{}ll@{}}
\toprule\noalign{}
test & haha \\
\midrule\noalign{}
\endhead
\bottomrule\noalign{}
\endlastfoot
1 & 1 \\
2 & 3 \\
\end{longtable}

and the error is

! Misplaced \noalign. \l_tmpa_tl ->\noalign {} test l.902 \end {longtable} ?

When I remove all \noalign commands and \bottomrule\noalign command (looks like \bottomrule has an implicit \noalign) and add a \let\endlastfoot\empty to the preamble, it seems to work.

This compiles:

\begin{longtable}[]{@{}ll@{}}
\toprule
test & haha \\
\midrule
\endhead
\endlastfoot
1 & 1 \\
2 & 3 \\
\end{longtable}

Any idea whats going on with \noalign and \bottomrule? Right now, no Markdown -> Latex table work :)

I'm using TexLive 2023.

jankap avatar Nov 27 '23 19:11 jankap

Any idea whats going on with \noalign and \bottomrule?

@jankap Yes, I wrote my snippet for pandoc version 2.9.2.1 (that seams to be the latest version available through my package manager, however it's also 3 years old at this point). I should have mentioned the version number in my original description, sorry about that.

Pandoc adds some \noalign commands:

@jankap Yes, I can reproduced this in pandoc version 3.1.9 (build from the current main branch). Apparently the latex output changed and we have those \noaling commands now. The revision history for version 3.0 mentions the change in it's section about the LaTeX writer. I tried updating my original code for this version and here is what I came up with:

$if(tables)$
\usepackage{tabularray}
\let\longtable\longtblr
\let\endlongtable\endlongtblr
\NewTblrTheme{headless}{
    \DefTblrTemplate{contfoot-text}{default}{}
    \DefTblrTemplate{conthead-text}{default}{}
    \DefTblrTemplate{caption}{default}{}
    \DefTblrTemplate{conthead}{default}{}
    \DefTblrTemplate{capcont}{default}{}
}
\SetTblrOuter[longtblr]{
    entry=none,
    label=none,
    theme=headless,
}
\let\noalign\empty
\def\endlastfoot{\hspace{-2.5mm}} % compensate whitespace produced by left over \empty tokens
\let\endhead\empty
\def\toprule{\hspace{-1mm}} % compensate whitespace produced by left over \empty tokens
\let\midrule\empty
\let\bottomrule\empty
% add some optional styling
\UseTblrLibrary{varwidth}
\SetTblrInner[longtblr]{
  row{1}     = {gray!20,valign=h},
  hline{1,Z} = {0.3mm},
  hline{2}   = {0.1mm},
  row{Z}     = {valign=f},
  row{odd}   = {gray!5},
  hspan      = minimal,
  columns    = {co=1,valign=t},
}
$endif$

However this still uses a really dirty hack. Because redefining \noalign etc. as \empty still leaves tokens and I cannot find any easy ways to remove those, latex will still insert some whitespace before the actual cell content. The hack is using negative space to compensate for the unwanted space. Visually this looks fine, but it is not exactly a clean solution.

I tried to add this to the preamble, but the ifs seem not to be allowed there. I removed them - not sure if that's correct or not.

@jankap did you by any chance add this to a latex document class or package that that gets imported by your actual template? If so, simply dropping the $if(tables)$ and $endif$ was probably the thing to do and things should work normally. To me a pandoc template is a variation on the default template (run pandoc -D latex to print it). Pandoc templates are specified using the --template path/to/template.latex command line argument. $if(tables)$...$endif$ should work in pandoc templates. In fact the default already includes a $if(tables)$ section that you would replace by the snippet above. Anyway the purpose of the if-condition is preventing latex from loading tables related packages on documents that don't actually contain any tables. Other than slightly increased latex runtimes I see no harm in removing them.

JakeI avatar Nov 28 '23 12:11 JakeI

@JakeI thank you very much!

Rigth now, that snipped does not work, but I can't find any missing brackets. The table is generated by Pandoc.

image

\documentclass{article}

\usepackage{cite}
\usepackage{amssymb,amsfonts}

\usepackage{tabularray}
\UseTblrLibrary{booktabs}
\UseTblrLibrary{amsmath}
% \usepackage{subcaption}
% \usepackage{caption}
\usepackage{longtable}

\usepackage{algorithmic}
\usepackage{graphicx}
\usepackage{textcomp}
\usepackage{xcolor}
\def\BibTeX{{\rm B\kern-.05em{\sc i\kern-.025em b}\kern-.08em
    T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}}

\usepackage{siunitx}


% % see https://github.com/lvjr/tabularray/discussions/396
\DefTblrTemplate{caption-sep}{default}{\par}
\DefTblrTemplate{caption}{default}{%
  \makebox[\tablewidth]{\parbox{\columnwidth}{%
    \UseTblrAlign{caption}%
    \UseTblrTemplate{caption-tag}{default}%
    \UseTblrTemplate{caption-sep}{default}%
    \UseTblrFont{caption-text}%
    \UseTblrTemplate{caption-text}{default}%
  }}%
}
\SetTblrStyle{caption}{halign=c}
\SetTblrStyle{caption-text}{font=\scshape}
\SetTblrStyle{note}{indent=\tabcolsep}

% added by Jan, see https://github.com/lvjr/tabularray/issues/268
% \DefTblrTemplate{firsthead}{default}{\addtocounter{table}{-1}\captionof{table}{\InsertTblrText{caption}}}

%% see https://github.com/jgm/pandoc/issues/7475#issuecomment-1829790171
% if we want to enable pandoc tables again...
% $if(tables)$
\usepackage{tabularray}
\let\longtable\longtblr
\let\endlongtable\endlongtblr
\NewTblrTheme{headless}{
    \DefTblrTemplate{contfoot-text}{default}{}
    \DefTblrTemplate{conthead-text}{default}{}
    \DefTblrTemplate{caption}{default}{}
    \DefTblrTemplate{conthead}{default}{}
    \DefTblrTemplate{capcont}{default}{}
}
\SetTblrOuter[longtblr]{
    entry=none,
    label=none,
    theme=headless,
}
\let\noalign\empty
\def\endlastfoot{\hspace{-2.5mm}} % compensate whitespace produced by left over \empty tokens
\let\endhead\empty
\def\toprule{\hspace{-1mm}} % compensate whitespace produced by left over \empty tokens
\let\midrule\empty
\let\bottomrule\empty
% add some optional styling
\UseTblrLibrary{varwidth}
\SetTblrInner[longtblr]{
  row{1}     = {gray!20,valign=h},
  hline{1,Z} = {0.3mm},
  hline{2}   = {0.1mm},
  row{Z}     = {valign=f},
  row{odd}   = {gray!5},
  hspan      = minimal,
  columns    = {co=1,valign=t},
}
% $endif$

\definecolor{abstractbg}{rgb}{0.89804,0.94510,0.83137}
\setlength{\fboxrule}{0pt}
\setlength{\fboxsep}{0pt}
\begin{document}
\title{Title}

\maketitle

% pandoc test (output from a md file)
\begin{longtable}[]{@{}ll@{}}
  \toprule\noalign{}
  test & haha \\
  \midrule\noalign{}
  \endhead
  \bottomrule\noalign{}
  \endlastfoot
  1 & 1 \\
  2 & 3 \\
  \end{longtable}
\end{document}

jankap avatar Nov 28 '23 14:11 jankap

@jankap that's odd, I cannot reproduce this. If I copy your example code and run pdflatex, or xelatex, or lualatex on it the expected pdf is produced. You seem to be using some sort of graphical development environment. Maybe you could try running latex directly? Maybe delete any .aux files etc. and run things from scratch?

I was using pdfTeX 3.141592653-2.6-1.40.22 (TeX Live 2022/dev/Debian)

JakeI avatar Nov 28 '23 16:11 JakeI

@JakeI very interesting. I can reproduce the error with pdflatex and lualatex.

I'm using a (self-built) Docker image, do you have Docker? You could try it on your own, e.g. by using https://hub.docker.com/r/pandoc/latex.

Copy the latex source below into a local file "test.tex". Then, run from this folder: docker run -it --rm -v ".:/data" --entrypoint '/bin/sh' pandoc/latex:3.1 -c "tlmgr install amsmath booktabs tabularray siunitx cite varwidth xcolor ninecolors && pdflatex test.tex", the error is still (and the pandoc container uses tex live 2022!)

LaTeX Warning: No \author given.

! Missing } inserted.
<inserted text>
                }
l.96   \end
           {longtable}
?

test.tex:

\documentclass{article}

\usepackage{cite}
\usepackage{amssymb,amsfonts}

\usepackage{tabularray}
\UseTblrLibrary{booktabs}
\UseTblrLibrary{amsmath}
% \usepackage{subcaption}
% \usepackage{caption}
\usepackage{longtable}

\usepackage{graphicx}
\usepackage{textcomp}
\usepackage{xcolor}
\def\BibTeX{{\rm B\kern-.05em{\sc i\kern-.025em b}\kern-.08em
    T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}}

\usepackage{siunitx}


% % see https://github.com/lvjr/tabularray/discussions/396
\DefTblrTemplate{caption-sep}{default}{\par}
\DefTblrTemplate{caption}{default}{%
  \makebox[\tablewidth]{\parbox{\columnwidth}{%
    \UseTblrAlign{caption}%
    \UseTblrTemplate{caption-tag}{default}%
    \UseTblrTemplate{caption-sep}{default}%
    \UseTblrFont{caption-text}%
    \UseTblrTemplate{caption-text}{default}%
  }}%
}
\SetTblrStyle{caption}{halign=c}
\SetTblrStyle{caption-text}{font=\scshape}
\SetTblrStyle{note}{indent=\tabcolsep}

% added by Jan, see https://github.com/lvjr/tabularray/issues/268
% \DefTblrTemplate{firsthead}{default}{\addtocounter{table}{-1}\captionof{table}{\InsertTblrText{caption}}}

%% see https://github.com/jgm/pandoc/issues/7475#issuecomment-1829790171
% if we want to enable pandoc tables again...
% $if(tables)$
\usepackage{tabularray}
\let\longtable\longtblr
\let\endlongtable\endlongtblr
\NewTblrTheme{headless}{
    \DefTblrTemplate{contfoot-text}{default}{}
    \DefTblrTemplate{conthead-text}{default}{}
    \DefTblrTemplate{caption}{default}{}
    \DefTblrTemplate{conthead}{default}{}
    \DefTblrTemplate{capcont}{default}{}
}
\SetTblrOuter[longtblr]{
    entry=none,
    label=none,
    theme=headless,
}
\let\noalign\empty
\def\endlastfoot{\hspace{-2.5mm}} % compensate whitespace produced by left over \empty tokens
\let\endhead\empty
\def\toprule{\hspace{-1mm}} % compensate whitespace produced by left over \empty tokens
\let\midrule\empty
\let\bottomrule\empty
% add some optional styling
\UseTblrLibrary{varwidth}
\SetTblrInner[longtblr]{
  row{1}     = {gray!20,valign=h},
  hline{1,Z} = {0.3mm},
  hline{2}   = {0.1mm},
  row{Z}     = {valign=f},
  row{odd}   = {gray!5},
  hspan      = minimal,
  columns    = {co=1,valign=t},
}
% $endif$

\definecolor{abstractbg}{rgb}{0.89804,0.94510,0.83137}
\setlength{\fboxrule}{0pt}
\setlength{\fboxsep}{0pt}
\begin{document}
\title{Title}

\maketitle

% pandoc test (output from a md file)
\begin{longtable}[]{@{}ll@{}}
  \toprule\noalign{}
  test & haha \\
  \midrule\noalign{}
  \endhead
  \bottomrule\noalign{}
  \endlastfoot
  1 & 1 \\
  2 & 3 \\
  \end{longtable}
\end{document}

Some version infos of the container:

PS D:\OneDrive\Promotion> docker run -it --rm -v ".:/data"  --entrypoint '/bin/sh' pandoc/latex:3.1 -c "pdflatex -v"  
pdfTeX 3.141592653-2.6-1.40.24 (TeX Live 2022)                                           
kpathsea version 6.3.4
Copyright 2022 Han The Thanh (pdfTeX) et al.
There is NO warranty.  Redistribution of this software is
covered by the terms of both the pdfTeX copyright and
the Lesser GNU General Public License.
For more information about these matters, see the file
named COPYING and the pdfTeX source.
Primary author of pdfTeX: Han The Thanh (pdfTeX) et al.
Compiled with libpng 1.6.37; using libpng 1.6.37
Compiled with zlib 1.2.11; using zlib 1.2.11
Compiled with xpdf version 4.03
PS D:\OneDrive\Promotion> 

jankap avatar Dec 02 '23 18:12 jankap

@jankap ok, so using docker I was able to reproduce this. This ! missing } error appears to show up if some exact combination of latex package version and latex version happen to be installed.

The fix is removing the \UseTblrLibrary{booktabs} line. That's a good idea anyway because pandoc 3 no longer generates \toprule, \midrule, and \bottomrule and the only job of the booktabs library is providing those.

JakeI avatar Dec 06 '23 15:12 JakeI

@JakeI Impressive find, thank you very much, I've tried lots of things and couldn't solve it :) Unfortunately, I'm using \cmidrule a lot for sub-tables. That's why I had to load booktabs. I'm looking for alternatives in tabularray though.

jankap avatar Dec 10 '23 17:12 jankap

@jankap ok, so using docker I was able to reproduce this. This ! missing } error appears to show up if some exact combination of latex package version and latex version happen to be installed.

The fix is removing the \UseTblrLibrary{booktabs} line. That's a good idea anyway because pandoc 3 no longer generates \toprule, \midrule, and \bottomrule and the only job of the booktabs library is providing those.

Thank you @JakeI for the useful code snippet. But a table with a caption would result in error. Do you have any idea to make it support captioned table?

zwz avatar Jan 20 '24 07:01 zwz

I think it would be best to have a Pandoc command-line option / defaults file entry to choose between the table packages. For academic and traditional publishing, booktabs is still the best choice, and I would make it the default setting. It enforces good design principles, and it's simple to use (because a lot of decisions are made on your behalf).

In cases where you need more precise control (perhaps to apply corporate brand design standards, or just for creativity), tabularray would be the better option.

I agree with @mhwombat, options would be much better, and booktabs should be the default — it's clean, simple, doesn't break anything, works in twocolumn, and overall has a great design philosophy.

(Currently desperately trying to find a workaround to get longtable to work in twocolumn.)

tytyvillus avatar Mar 02 '24 12:03 tytyvillus

Hello,

I have created a very basic Lua Filter for Pandoc that generates a table with Tabularray. There's a demo repository here https://github.com/yuki/pandoc-filter-tabularray

The main feature is that I parse the caption and if it has the key-value "tablename=XXX" the generated table has this value. So, if the Markdown file has:

| Head 1 | Head 2  | Head 3 | 
|:-------|:-------:|-------:|
| Alpha  | Beta    | Gamma  | 
| Delta  | Epsilon | Zeta   |
| Eta    | Theta   | Iota |

Table: Table content {tablename=yukitblr}

The output is:

\begin{yukitblr}[caption={Table content }]{X[l]X[c]X[r]}
Head 1 & Head 2 & Head 3 \\ 
Alpha & Beta & Gamma \\ 
Delta & Epsilon & Zeta \\ 
Eta & Theta & Iota \\ 
\end{yukitblr}

Of course, in the LaTeX template there must be setted the \NewTblrEnviron{yukitblr} in order to work. In the repository I have added a very basic LaTeX template with this custom environment.

I know that it's not the best option, but AFAIK there's no "a good way" to do this. And in this thread https://github.com/jgm/pandoc/issues/6317 are talking about it. So yes, this is a hack that works for me :smile:

The filter also admits HTML tables, which generates tabularray table, but right now the alignment is not working. Also there's an example in the repository.

I have used Tabularray in a very basic way in the last three years to create custom table-environments. So in order to convert my tables into Markdown (or HTML), this filter does the job for me.

PS: I'm not a Lua programmer, so the code can be very ugly and maybe not working in all scenarios.

yuki avatar Apr 20 '24 18:04 yuki