readr icon indicating copy to clipboard operation
readr copied to clipboard

Warning with the UCRT build on windows

Open jimhester opened this issue 3 years ago • 2 comments
trafficstars

Quitting from lines 141-166 (locales.Rmd)
Error: processing vignette 'locales.Rmd' failed with diagnostics:
translating strings with "bytes" encoding is not allowed

jimhester avatar Nov 29 '21 15:11 jimhester

https://github.com/tidyverse/readr/commit/e72a281f517025174ddd3aaf6feb805ec022ba5b

DavisVaughan avatar Nov 29 '21 23:11 DavisVaughan

I am not sure what I am writing now is related to this issue, but I run R on a Windows machine and while trying to reproduce the example in vignette, I get this:

library(stringi)
#> Warning: package 'stringi' was built under R version 4.1.2
x <- "Émigré cause célèbre déjà vu.\n"
y <- stri_conv(x, "UTF-8", "latin1")
#> Warning in stri_conv(x, "UTF-8", "latin1"): input data \xffffffc9 in the current
#> source encoding could not be converted to Unicode
#> Warning in stri_conv(x, "UTF-8", "latin1"): input data \xffffffe9 in the current
#> source encoding could not be converted to Unicode

#> Warning in stri_conv(x, "UTF-8", "latin1"): input data \xffffffe9 in the current
#> source encoding could not be converted to Unicode
#> Warning in stri_conv(x, "UTF-8", "latin1"): input data \xffffffe8 in the current
#> source encoding could not be converted to Unicode
#> Warning in stri_conv(x, "UTF-8", "latin1"): input data \xffffffe9 in the current
#> source encoding could not be converted to Unicode
#> Warning in stri_conv(x, "UTF-8", "latin1"): input data \xffffffe0 in the current
#> source encoding could not be converted to Unicode
#> Warning in stri_conv(x, "UTF-8", "latin1"): the Unicode code point \U0000fffd
#> cannot be converted to destination encoding

#> Warning in stri_conv(x, "UTF-8", "latin1"): the Unicode code point \U0000fffd
#> cannot be converted to destination encoding

#> Warning in stri_conv(x, "UTF-8", "latin1"): the Unicode code point \U0000fffd
#> cannot be converted to destination encoding

#> Warning in stri_conv(x, "UTF-8", "latin1"): the Unicode code point \U0000fffd
#> cannot be converted to destination encoding

#> Warning in stri_conv(x, "UTF-8", "latin1"): the Unicode code point \U0000fffd
#> cannot be converted to destination encoding

#> Warning in stri_conv(x, "UTF-8", "latin1"): the Unicode code point \U0000fffd
#> cannot be converted to destination encoding

# These strings look like they're identical:
x
#> [1] "Émigré cause célèbre déjà vu.\n"
y
#> [1] "\032migr\032 cause c\032l\032bre d\032j\032 vu.\n"
identical(x, y)
#> [1] FALSE

# But they have difference encodings:
Encoding(x)
#> [1] "latin1"
Encoding(y)
#> [1] "unknown"

Created on 2022-02-18 by the reprex package (v2.0.1)

I can reproduce the example if I replace the line y <- stri_conv(x, "UTF-8", "latin1") with y <- stri_conv(x, from = "latin1", to = "UTF-8").

Some details about my machine from sessionInfo():

R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=Dutch_Belgium.1252 
[2] LC_CTYPE=Dutch_Belgium.1252   
[3] LC_MONETARY=Dutch_Belgium.1252
[4] LC_NUMERIC=C                  
[5] LC_TIME=Dutch_Belgium.1252

damianooldoni avatar Feb 18 '22 18:02 damianooldoni

I assume that this has been fixed.

hadley avatar Jul 31 '23 22:07 hadley