gt icon indicating copy to clipboard operation
gt copied to clipboard

Problem with UTF-8 characters, PDF output, v1.1 regression

Open cswingle opened this issue 3 months ago • 5 comments

Prework

Description

Using gt within a Quarto document, PDF output, pdf-engine: xelatex, I get errors and/or incorrect table formatting with gt 1.1 when the cell content includes accented characters. The new version is changing characters like to \textquoteright and isn't adding a space or {}, resulting in errors with names. And é is converted to \\'e, which TeX interprets as a table linefeed instead of an accented e. Part of the reason for using pdf-engine: xelatex is to avoid any utf-8 translations.

This is a regression from v1.0, which does not have any of these errors with the same document.

Reproducible example

  • [x] Post a minimal reproducible example so the maintainer can troubleshoot the problems you identify. A reproducible example is:
    • [x] Runnable: post enough R code and data so any onlooker can create the error on their own computer.
    • [x] Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
    • [x] Readable: format your code according to the tidyverse style guide.
---
title: "GT Test"
engine: knitr
execute:
  message: false
  warning: false
  echo: true
  eval: true
format:
  pdf:
    pdf-engine: xelatex
    keep-tex: true
---

```{r}
library(tibble)
library(gt)

sessionInfo()

tribble(
  ~name, ~finish_time,
  "Chloé Laplantine", "4:37:12",
  "Trent O’Shell", "5:15:46",
) |>
  gt()
```

Expected result

I expect the table to render, with the original UTF-8 characters. For example, here's what the TeX looks like for the table when using gt 1.0:

\begin{table}
\fontsize{12.0pt}{14.4pt}\selectfont
\begin{tabular*}{\linewidth}{@{\extracolsep{\fill}}lr}
\toprule
name & finish\_time \\ 
\midrule\addlinespace[2.5pt]
Chloé Laplantine & 4:37:12 \\ 
Trent O’Shell & 5:15:46 \\ 
\bottomrule
\end{tabular*}
\end{table}

With gt 1.1, the TeX looks like the following, which causes a linefeed between Chlo and the accented character, and TeX quits at \textquoterightShell (since that's not a TeX command).

\begin{table}
\fontsize{12.0pt}{14.0pt}\selectfont
\begin{tabular*}{\linewidth}{@{\extracolsep{\fill}}lr}
\toprule
name & finish\_time \\ 
\midrule\addlinespace[2.5pt]
Chlo\\'e Laplantine & 4:37:12 \\ 
Trent O\textquoterightShell & 5:15:46 \\ 
\bottomrule
\end{tabular*}
\end{table}

Session info

R version 4.5.0 (2025-04-11)
Platform: x86_64-pc-linux-gnu
Running under: Debian GNU/Linux 13 (trixie)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.29.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_DK.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/Anchorage
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gt_1.1.0     tibble_3.3.0

loaded via a namespace (and not attached):
 [1] digest_0.6.37     R6_2.6.1          fastmap_1.2.0     tidyselect_1.2.1  magrittr_2.0.4    glue_1.8.0        pkgconfig_2.0.3   htmltools_0.5.8.1 dplyr_1.1.4       generics_0.1.4    lifecycle_1.0.4   xml2_1.4.0        cli_3.6.5        
[14] vctrs_0.6.5       compiler_4.5.0    pillar_1.11.1     rlang_1.1.6       fs_1.6.6 

cswingle avatar Sep 24 '25 19:09 cswingle

Thanks for the report! @thebioengineer would you consider reverting https://github.com/rstudio/gt/pull/1996 ? That should fix the problem.

rich-iannone avatar Sep 29 '25 14:09 rich-iannone

Presumably there's a use case for #1996, which is to replace Unicode characters with LaTeX equivalents for TeX engines that don't understand UTF-8, and I'd guess that except for the space issue also mentioned in #2041, it's a good merge for that scenario. Maybe there needs to be another tab_option for users that don't need Unicode replacement unless there's a way for gt in a quarto chunk to know the TeX engine supports UTF-8 out of the box, like xelatex does?

cswingle avatar Sep 29 '25 17:09 cswingle

I agree an option may be nice to toggle between the unicode replacement with latex equivalents or not. I had this problem with latex not wanting to convert some ≥ values, which is what led me down this path.

Ive gone through the unicode conversion table and corrected some that had too many "", and added curly braces around the others. Its in PR #2042, but will continue to work on this.

thebioengineer avatar Sep 29 '25 18:09 thebioengineer

@thebioengineer I also agree that an option would be good here for this functionality. Would you consider that option being an opt-in one for Unicode replacement?

rich-iannone avatar Sep 30 '25 13:09 rich-iannone

I think so - Lets break it into 2 PRs to make it a little simpler to resolve and maybe discuss how best to do the unicode replacement.

thebioengineer avatar Sep 30 '25 14:09 thebioengineer