crayon
crayon copied to clipboard
crayon doesn't mark encoding on UTF-8 strings in some cases
For example:
library(crayon)
text <- "你好"
crayon::white(text)
crayon::white(crayon::white(text))
I see:
> crayon::white(text)
[1] "\033[37m你好\033[39m"
> crayon::white(crayon::white(text))
[1] "\033[37m\033[37mä½ å¥½\033[37m\033[39m"
Note that the text 你好 in the second example is no longer encoded correctly.
> Encoding(crayon::white(text))
[1] "UTF-8"
> Encoding(crayon::white(crayon::white(text)))
[1] "unknown"
Simply marking the encoding doesn't seem to be sufficient, though:
> white <- crayon::white(crayon::white(text))
> Encoding(white) <- "UTF-8"
> white
[1] "\033[37m\033[37m\xe4� 好\033[37m\033[39m"
so there might be something a little more fundamental going on.
This works as expected with crayon 1.4.2, so appears to be a regression.
> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 10 x64 (build 22581)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] crayon_1.5.1
loaded via a namespace (and not attached):
[1] compiler_4.1.3 tools_4.1.3
It might be related to some recent changes re: gsub(..., useBytes = TRUE):
> text1 <- "你好" # no quotes
> text2 <- "'你好'" # has quotes
> gsub("'", "", text1, useBytes = TRUE)
[1] "你好"
> gsub("'", "", text2, useBytes = TRUE)
[1] "ä½ å¥½"
but marking the encoding post-hoc seems sufficient.
> t2 <- gsub("'", "", text2, useBytes = TRUE)
> Encoding(t2) <- "UTF-8"
> t2
[1] "你好"
The issue no longer occurs with R 4.2.0:
> library(crayon)
> text <- "你好"
> crayon::white(text)
[1] "\033[37m你好\033[39m"
> crayon::white(crayon::white(text))
[1] "\033[37m\033[37m你好\033[37m\033[39m"
and
> text1 <- "你好" # no quotes
> text2 <- "'你好'" # has quotes
> gsub("'", "", text1, useBytes = TRUE)
[1] "你好"
> gsub("'", "", text2, useBytes = TRUE)
[1] "你好"
I'm not sure whether supporting older versions of R on Windows is a priority.
I think this is fixed in dev crayon.