crayon icon indicating copy to clipboard operation
crayon copied to clipboard

crayon doesn't mark encoding on UTF-8 strings in some cases

Open kevinushey opened this issue 3 years ago • 2 comments

For example:

library(crayon)
text <- "你好"
crayon::white(text)
crayon::white(crayon::white(text))

I see:

> crayon::white(text)
[1] "\033[37m你好\033[39m"
> crayon::white(crayon::white(text))
[1] "\033[37m\033[37mä½ å¥½\033[37m\033[39m"

Note that the text 你好 in the second example is no longer encoded correctly.

> Encoding(crayon::white(text))
[1] "UTF-8"
> Encoding(crayon::white(crayon::white(text)))
[1] "unknown"

Simply marking the encoding doesn't seem to be sufficient, though:

> white <- crayon::white(crayon::white(text))
> Encoding(white) <- "UTF-8"
> white
[1] "\033[37m\033[37m\xe4� 好\033[37m\033[39m"

so there might be something a little more fundamental going on.

This works as expected with crayon 1.4.2, so appears to be a regression.


> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 10 x64 (build 22581)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] crayon_1.5.1

loaded via a namespace (and not attached):
[1] compiler_4.1.3 tools_4.1.3   

kevinushey avatar Mar 30 '22 21:03 kevinushey

It might be related to some recent changes re: gsub(..., useBytes = TRUE):

> text1 <- "你好"     # no quotes
> text2 <- "'你好'"   # has quotes
> gsub("'", "", text1, useBytes = TRUE)
[1] "你好"
> gsub("'", "", text2, useBytes = TRUE)
[1] "ä½ å¥½"

but marking the encoding post-hoc seems sufficient.

> t2 <- gsub("'", "", text2, useBytes = TRUE)
> Encoding(t2) <- "UTF-8"
> t2
[1] "你好"

kevinushey avatar Mar 30 '22 21:03 kevinushey

The issue no longer occurs with R 4.2.0:

> library(crayon)
> text <- "你好"
> crayon::white(text)
[1] "\033[37m你好\033[39m"
> crayon::white(crayon::white(text))
[1] "\033[37m\033[37m你好\033[37m\033[39m"

and

> text1 <- "你好"     # no quotes
> text2 <- "'你好'"   # has quotes
> gsub("'", "", text1, useBytes = TRUE)
[1] "你好"
> gsub("'", "", text2, useBytes = TRUE)
[1] "你好"

I'm not sure whether supporting older versions of R on Windows is a priority.

kevinushey avatar May 31 '22 18:05 kevinushey

I think this is fixed in dev crayon.

gaborcsardi avatar Sep 28 '22 10:09 gaborcsardi