tabularray icon indicating copy to clipboard operation
tabularray copied to clipboard

Unexpected braces stripping when splitting table into lines and lines into cells

Open muzimuzhi opened this issue 1 year ago • 1 comments

Tabularray uses \seq_set_split:Nnn to split the table content into lines and lines into cells.

In splitting a token list into items, \seq_set_split:Nnn <seq var> {<delimiter>} {<token list>} first trims spaces from both sides of an item, then removes at most one pair of braces.

This documented behavior of \seq_set_split:Nnn may strip more pairs of braces from tabularray table cells than expected, especially when there is only one column. See https://github.com/lvjr/tabularray/issues/90#issuecomment-2161621946 for an example from @mikkelee.

Example below shows that

  • in a single-column table, at most four pairs of braces are stripped from a table cell (line {{{{{e}}}}} -> cell text {e}) and
  • in a multi-column table, at most three pairs of braces are stripped (line 1 & {{{{d}}}} -> cell text 1 and {d}).
\documentclass{article}
\usepackage{tabularray}

\ExplSyntaxOn
\makeatletter

\cs_new_protected:Nn \__my_debug:n
  {
    \iow_log:e { !~ [debug]~ #1 }
  }

\cs_gset_protected:Npn \__tblr_split_one_line:nn #1 #2
  {
    \seq_set_split:Nnn \l_tmpa_seq { & } { #2 }
    \__my_debug:n {
      line(#1):~
      raw~<\tl_to_str:n {#2}>,~
      processed~<\seq_use:Nn \l_tmpa_seq {,}>
    }
    \int_set:Nn \c@rownum {#1}
    \int_zero:N \c@colnum
    \seq_map_inline:Nn \l_tmpa_seq
      {
        \tl_set:Nn \l_tmpa_tl { ##1 }
        \__tblr_remove_braces:N \l_tmpa_tl
        \__tblr_trim_par_space_tokens:N \l_tmpa_tl
        \int_incr:N \c@colnum
        \__tblr_extract_table_commands:N \l_tmpa_tl
        \__tblr_trim_par_space_tokens:N \l_tmpa_tl
        \__tblr_spec_gput:neV { text } { [#1][\int_use:N \c@colnum] } \l_tmpa_tl
        \__my_debug:n
          {
            \@spaces cell(#1,\int_use:N \c@colnum):~
            raw~<\tl_to_str:n {##1}>,~
            processed~<\tl_to_str:N \l_tmpa_tl>
          }
      }
    %% Decrease row count by 1 if the last row has only one empty cell text
    %% We need to do it here since the > or < column type may add text to cells
    \bool_lazy_all:nTF
      {
        { \int_compare_p:nNn {#1} = {\c@rowcount} }
        { \int_compare_p:nNn {\c@colnum} = {1} }
        { \tl_if_empty_p:N \l_tmpa_tl }
      }
      { \int_decr:N \c@rowcount }
      {
        \__tblr_prop_gput:nnx
          {row} { [#1] / cell-number } { \int_use:N \c@colnum }
        \int_compare:nT { \c@colnum > \c@colcount }
          {
            \int_set_eq:NN \c@colcount \c@colnum
          }
      }
  }
\makeatother

\NewColumnType {Z} { Q[r, cmd=\test_tblr_cell:n] }

\cs_new_protected:Nn \test_tblr_cell:n
  { #1 ~ is ~ \tl_if_head_is_group:nTF { #1 } { braced } { unbraced } }

\ExplSyntaxOff

\begin{document}

\begin{tblr}{Z}
  a \\
  {b} \\
  {{c}} \\
  {{{d}}} \\
  {{{{e}}}}
\end{tblr}

\begin{tblr}{ZZ}
  1 & a \\
  2 & {b} \\
  3 & {{c}} \\
  4 & {{{d}}}
\end{tblr}

\end{document}

debug log

! [debug] line(1): raw <a>, processed <a>
! [debug]     cell(1,1): raw <a>, processed <a>
! [debug] line(2): raw <b>, processed <b>
! [debug]     cell(2,1): raw <b>, processed <b>
! [debug] line(3): raw <{c}>, processed <c>
! [debug]     cell(3,1): raw <c>, processed <c>
! [debug] line(4): raw <{{d}}>, processed <{d}>
! [debug]     cell(4,1): raw <{d}>, processed <d>
! [debug] line(5): raw <{{{e}}}>, processed <{{e}}>
! [debug]     cell(5,1): raw <{{e}}>, processed <{e}>

! [debug] line(1): raw <1 & a>, processed <1,a>
! [debug]     cell(1,1): raw <1>, processed <1>
! [debug]     cell(1,2): raw <a>, processed <a>
! [debug] line(2): raw <2 & {b}>, processed <2,b>
! [debug]     cell(2,1): raw <2>, processed <2>
! [debug]     cell(2,2): raw <b>, processed <b>
! [debug] line(3): raw <3 & {{c}}>, processed <3,{c}>
! [debug]     cell(3,1): raw <3>, processed <3>
! [debug]     cell(3,2): raw <{c}>, processed <c>
! [debug] line(4): raw <4 & {{{d}}}>, processed <4,{{d}}>
! [debug]     cell(4,1): raw <4>, processed <4>
! [debug]     cell(4,2): raw <{{d}}>, processed <{d}>

PDF output image

muzimuzhi avatar Jun 11 '24 22:06 muzimuzhi

Thanks! I can verify that my local document (more complex than the minimal test case) works with 3/4 braces

mikkelee avatar Jun 12 '24 06:06 mikkelee

@muzimuzhi Do you have any plan to fix this issue before next release (which will contain breaking changes)?

lvjr avatar Nov 19 '24 02:11 lvjr

muzimuzhi/tabularray-dev@523d07f (feat: keep braces when splitting, 2024-11-16) did half the work. It introduced\zutil_seq_set_split_keep_braces:Nnn which works like \seq_set_split:Nnn but keeps braces, and used it to split table into rows then rows into cells.

This way, escaping siunitx S column needs only one pair of braces.

The remaining work is to avoid brace stripping happened in \__tblr_extract_table_commands:N. It currently converts {b} to b and {b}b to bb.

\documentclass{article}
\usepackage{tabularray}

\begin{document}
\SetTblrTracing{+text}

\begin{tblr}{}
  a & \\
  {b} & {b}b \\
  {{c}} \\
\end{tblr}
\end{document}
> The spec list text_1 contains the pairs:.
>  {[1][1]}  =>  {a}.
>  {[1][2]}  =>  {}.
>  {[2][1]}  =>  {b}.
>  {[2][2]}  =>  {bb}.
>  {[3][1]}  =>  {c}.
>  {[4][1]}  =>  {}.

BTW, is the next release in Sept 2025, as listed in the Changelog wiki?


I'm not a big fan of some of the designs in tabularray thus decided to do experiments in my own copy, the muzimuzhi/tabularray-dev repo. Sorry.

muzimuzhi avatar Nov 19 '24 06:11 muzimuzhi

@muzimuzhi Never mind. People should spend their limited spare time on things they could enjoy. I asked just to avoid potential waste of time after I saw zutil.sty.

Then I will fix this issue early next year, or in the third breaking release (in around 2028).

lvjr avatar Nov 19 '24 07:11 lvjr

BTW, is the next release in Sept 2025, as listed in the Changelog wiki?

I choose to use wiki page because the schedule can easily be changed. :-)

lvjr avatar Nov 19 '24 07:11 lvjr

Either way is acceptable, I just wanted to check if the next-release info in wiki is not outdated. Your question in https://github.com/lvjr/tabularray/issues/501#issuecomment-2484557411 gave me the feeling that the next release will happen in next 1-2 weeks, much more recent than Sept 2025.

muzimuzhi avatar Nov 19 '24 07:11 muzimuzhi

A breaking release needs to be the first release in the year, and be published after TeX Live release in the same year. For issue #525, I want to leave stable tabularray releases in historic TeX Live archives.

lvjr avatar Nov 19 '24 07:11 lvjr

muzimuzhi/tabularray-dev@523d07f (feat: keep braces when splitting, 2024-11-16) did half the work.

[...]

The remaining work is to avoid brace stripping happened in \__tblr_extract_table_commands:N. It currently converts {b} to b and {b}b to bb.

Done in muzimuzhi/tabularray-dev@f111ace (fix: keep braces when extracting table commands (#6), 2024-12-23). Check tblr-split.lvt and tblr-split.tlg to see the new behavior of table splitting.

As https://github.com/muzimuzhi/tabularray-dev is not a fork of this repo (due to GitHub limitations and for tracking its own issues), incorporation of these patches may need some hand-work.

For muzimuzhi/tabularray-dev@523d07f, uses of \zutil_seq_set_split_keep_braces:Nnn are necessary.

For muzimuzhi/tabularray-dev@f111ace, the use of \zutil_tl_trim_leading_spaces:n can be safely replaced by \tl_trim_spaces:n and muzimuzhi/tabularray-dev12e1e0a (shorten internal tl names, 2024-11-24) can be easily reverted (which was part of https://github.com/muzimuzhi/tabularray-dev/pull/6).

muzimuzhi avatar Dec 22 '24 21:12 muzimuzhi

Thank you. I have fixed this bug in branch "issue501".

PS, I cannot understand the necessity of using private functions and variables in l3kernel to write a new split function.

lvjr avatar Dec 25 '24 04:12 lvjr

I have fixed this bug in branch "issue501".

Perhaps only close this issue when its fix is included in the default branch?

PS, I cannot understand the necessity of using private functions and variables in l3kernel to write a new split function.

No necessity. Probably because I thought that's "safe".

Update: I realized it's my deliberate decision to make \zutil_seq_(g)set_split_keep_braces:Nnn easier to fail once relevant implementation in l3seq changes. I want my experimental codes to (only) work with latest l3kernel. Or, at first I wanted to share common internals with \seq_set_split... as far as possible. But then minor refactors make their implementations different bit by bit...

Update 2: After recent changes, \zutil_seq_(g)set_split_keep_braces:Nnn only uses \s__seq and \__seq_item:n used in sequence internal representation.

muzimuzhi avatar Dec 25 '24 04:12 muzimuzhi