tibble icon indicating copy to clipboard operation
tibble copied to clipboard

tibble instantiation fails with or alters non-English characters in column names

Open twest820 opened this issue 2 years ago • 0 comments

It appears dplyr 1.0.7 still has limitations around UTF-8 .R files. This happens when running through either RStudio 2021.09.0 or RGui 4.1.2. Some simple examples are

tibble(ΔC = 1)
Error: unexpected symbol in "tibble(\u394C"
tibble(`ΔC` = 1)
Error: \uxxxx sequences not supported inside backticks (line 1)

In practice, this generates much less clear parsing failures such as

Error: unexpected '\\' in: "<multiline code snippet without any \ not including the problematic column name>"
Error: unexpected ')' in "<multiline code snippet without any ) not including the problematic column name>"

Workaround is to identify and change out the problematic characters but this seems unnecessarily restrictive. In general, I wouldn't expect the keyboard used to type R code to affect whether or not the code can run. It's also curious to me the use of backticks influences whether a character is valid for use in a column name.

Additionally, I've encountered case sensitive behavior where, for example, δ is semi-supported but Δ is not. This seems like one of those maybe feature, maybe bug things as it introduces cases where column names which appear distinct to the programmer aren't actually distinct when code is interpreted.

tibble(δ13C = 1) # can reference this column with either $δ13C or $d13C
# A tibble: 1 x 1
   d13C
  <dbl>
1     1

Am I perhaps hitting use cases missed in tidyverse/dplyr#2471? Seems like there's maybe some legacy codepage translation going on which might shelter, for example, Greek users from being unable to execute their code.

twest820 avatar Dec 27 '21 17:12 twest820