Problem with UTF-8 characters, PDF output, v1.1 regression
Prework
- [x] Read and agree to the code of conduct and contributing guidelines.
- [x] If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
Description
Using gt within a Quarto document, PDF output, pdf-engine: xelatex, I get errors and/or incorrect table formatting with gt 1.1 when the cell content includes accented characters. The new version is changing characters like ’ to \textquoteright and isn't adding a space or {}, resulting in errors with names. And é is converted to \\'e, which TeX interprets as a table linefeed instead of an accented e. Part of the reason for using pdf-engine: xelatex is to avoid any utf-8 translations.
This is a regression from v1.0, which does not have any of these errors with the same document.
Reproducible example
- [x] Post a minimal reproducible example so the maintainer can troubleshoot the problems you identify. A reproducible example is:
- [x] Runnable: post enough R code and data so any onlooker can create the error on their own computer.
- [x] Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
- [x] Readable: format your code according to the tidyverse style guide.
---
title: "GT Test"
engine: knitr
execute:
message: false
warning: false
echo: true
eval: true
format:
pdf:
pdf-engine: xelatex
keep-tex: true
---
```{r}
library(tibble)
library(gt)
sessionInfo()
tribble(
~name, ~finish_time,
"Chloé Laplantine", "4:37:12",
"Trent O’Shell", "5:15:46",
) |>
gt()
```
Expected result
I expect the table to render, with the original UTF-8 characters. For example, here's what the TeX looks like for the table when using gt 1.0:
\begin{table}
\fontsize{12.0pt}{14.4pt}\selectfont
\begin{tabular*}{\linewidth}{@{\extracolsep{\fill}}lr}
\toprule
name & finish\_time \\
\midrule\addlinespace[2.5pt]
Chloé Laplantine & 4:37:12 \\
Trent O’Shell & 5:15:46 \\
\bottomrule
\end{tabular*}
\end{table}
With gt 1.1, the TeX looks like the following, which causes a linefeed between Chlo and the accented character, and TeX quits at \textquoterightShell (since that's not a TeX command).
\begin{table}
\fontsize{12.0pt}{14.0pt}\selectfont
\begin{tabular*}{\linewidth}{@{\extracolsep{\fill}}lr}
\toprule
name & finish\_time \\
\midrule\addlinespace[2.5pt]
Chlo\\'e Laplantine & 4:37:12 \\
Trent O\textquoterightShell & 5:15:46 \\
\bottomrule
\end{tabular*}
\end{table}
Session info
R version 4.5.0 (2025-04-11)
Platform: x86_64-pc-linux-gnu
Running under: Debian GNU/Linux 13 (trixie)
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.29.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_DK.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/Anchorage
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gt_1.1.0 tibble_3.3.0
loaded via a namespace (and not attached):
[1] digest_0.6.37 R6_2.6.1 fastmap_1.2.0 tidyselect_1.2.1 magrittr_2.0.4 glue_1.8.0 pkgconfig_2.0.3 htmltools_0.5.8.1 dplyr_1.1.4 generics_0.1.4 lifecycle_1.0.4 xml2_1.4.0 cli_3.6.5
[14] vctrs_0.6.5 compiler_4.5.0 pillar_1.11.1 rlang_1.1.6 fs_1.6.6
Thanks for the report! @thebioengineer would you consider reverting https://github.com/rstudio/gt/pull/1996 ? That should fix the problem.
Presumably there's a use case for #1996, which is to replace Unicode characters with LaTeX equivalents for TeX engines that don't understand UTF-8, and I'd guess that except for the space issue also mentioned in #2041, it's a good merge for that scenario. Maybe there needs to be another tab_option for users that don't need Unicode replacement unless there's a way for gt in a quarto chunk to know the TeX engine supports UTF-8 out of the box, like xelatex does?
I agree an option may be nice to toggle between the unicode replacement with latex equivalents or not. I had this problem with latex not wanting to convert some ≥ values, which is what led me down this path.
Ive gone through the unicode conversion table and corrected some that had too many "", and added curly braces around the others. Its in PR #2042, but will continue to work on this.
@thebioengineer I also agree that an option would be good here for this functionality. Would you consider that option being an opt-in one for Unicode replacement?
I think so - Lets break it into 2 PRs to make it a little simpler to resolve and maybe discuss how best to do the unicode replacement.