fs
fs copied to clipboard
`dir_ls()` trips over non-ascii file names when native encoding isn't UTF-8
Discovered while studying https://github.com/tidyverse/readr/issues/1345.
dir_map()
(on the C/C++ side) seems to assume libuv is giving it UTF-8 paths, but that's not true on Windows (where I made this reprex).
library(fs)
library(testthat)
twd <- path_temp(pattern = "dir-ls-reprex")
dir_create(twd)
owd <- setwd(twd)
cat("blah\n", file = "äçé")
(native <- list.files())
#> [1] "äçé"
(from_fs <- dir_ls())
#> äçé
# this mojibake is erroneously marked as UTF-8
Encoding(from_fs)
#> [1] "UTF-8"
# here are the bytes I'm expecting
(correct <- iconv(native, to = "UTF-8"))
#> [1] "äçé"
local({
local_edition(3)
expect_equal(
charToRaw(correct),
charToRaw(from_fs)
)
})
#> Error: charToRaw(correct) (`actual`) not equal to charToRaw(from_fs) (`expected`).
#>
#> `actual`: "c3" "a4" "c3" "a7" "c3" "a9" and 2 more...
#> `expected`: "c3" "83" "c2" "a4" "c3" "83" "c2" "a7" "c3" "83" ...
setwd(owd)
dir_delete(twd)
#> Error: [ENOENT] Failed to remove 'C:/Users/jenny/AppData/Local/Temp/RtmpeEIdP0/dir-ls-reprex/äçé': no such file or directory
Created on 2022-01-03 by the reprex package (v2.0.1)
For some meta points, I think the dir_ls()
bug is also why I can't use fs to delete the temp directory in this reprex.
Yes I bumped into this too (version 1.5.2). version 1.5.1 does not work either but 1.5.0 does not have this problem
The same problem exists for me when working with polish characters in filenames [Windows 10, fs package v1.5.2
]. Fortunately, it works with v1.5.0
.
Probably not surprisingly, but the same goes for dir_info()
library(fs)
fs::file_touch("bär")
dir()
#> [1] "bär" "well-mice_reprex.R"
#> [3] "well-mice_reprex.spin.R" "well-mice_reprex.spin.Rmd"
fs::dir_info()
#> # A tibble: 4 x 18
#> path type size permissions modification_time user group device_id
#> <fs::path> <fct> <fs:> <fs::perms> <dttm> <chr> <chr> <dbl>
#> 1 bär <NA> NA --- NA <NA> <NA> NA
#> 2 ~ce_reprex.R file 227 rw- 2022-02-03 20:06:41 <NA> <NA> 2.22e9
#> 3 ~prex.spin.R file 227 rw- 2022-02-03 20:06:43 <NA> <NA> 2.22e9
#> 4 ~ex.spin.Rmd file 855 rw- 2022-02-03 20:06:43 <NA> <NA> 2.22e9
#> # ... with 10 more variables: hard_links <dbl>, special_device_id <dbl>,
#> # inode <dbl>, block_size <dbl>, blocks <dbl>, flags <int>, generation <dbl>,
#> # access_time <dttm>, change_time <dttm>, birth_time <dttm>
Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 4.1.2 (2021-11-01)
#> os Windows 10 x64 (build 19043)
#> system x86_64, mingw32
#> ui RTerm
#> language en
#> collate German_Germany.1252
#> ctype German_Germany.1252
#> tz Europe/Berlin
#> date 2022-02-03
#> pandoc 2.17.1.1 @ C:/PROGRA~1/Pandoc/ (via rmarkdown)
#>
#> - Packages -------------------------------------------------------------------
#> package * version date (UTC) lib source
#> backports 1.4.1 2021-12-13 [1] CRAN (R 4.1.2)
#> cli 3.1.1 2022-01-20 [1] CRAN (R 4.1.2)
#> crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.1)
#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.2)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0)
#> fansi 1.0.2 2022-01-14 [1] CRAN (R 4.1.2)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
#> fs * 1.5.2.9000 2022-02-03 [1] Github (r-lib/fs@6d1182f)
#> glue 1.6.1 2022-01-22 [1] CRAN (R 4.1.2)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0)
#> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.1)
#> knitr 1.37 2021-12-16 [1] CRAN (R 4.1.2)
#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.1)
#> magrittr 2.0.2 2022-01-26 [1] CRAN (R 4.1.2)
#> pillar 1.6.5 2022-01-25 [1] CRAN (R 4.1.2)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
#> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.1.0)
#> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.0)
#> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.0)
#> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.1.1)
#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.0)
#> rlang 1.0.0 2022-01-26 [1] CRAN (R 4.1.2)
#> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.1)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2)
#> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
#> styler 1.6.2 2021-09-23 [1] CRAN (R 4.1.1)
#> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.2)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
#> withr 2.4.3 2021-11-30 [1] CRAN (R 4.1.2)
#> xfun 0.29 2021-12-14 [1] CRAN (R 4.1.2)
#> yaml 2.2.2 2022-01-25 [1] CRAN (R 4.1.2)
#>
#> [1] C:/Users/Daniel.AK-HAMBURG/Documents/R/win-library/4.1
#> [2] C:/Program Files/R/R-4.1.2/library
#>
#> ------------------------------------------------------------------------------
I ran into the same issue today with some folders named after the months in german.
March is called "März" and dir_ls() throws an error:
Error: [ENOENT] Failed to search directory 'C:/some_folder/year_2022/month_März': no such file or directory
I just updated to version 1.5.2 and didn't have those issues before.
As there is already plenty of code explaining the issue i decided to not provide any more. Sorry!
Sorry for off-topic: I'm having the same issues, but I can't compile v1.5.0 so that my code works. Any suggestions how to get it done? Win10 machine here.
Or someone "simply" fixes this bug ;)
You can install it from the versioned repository
# Installation of the fs package in its version 1.50 because the following versions (.51 and .52) are buggy
install.packages("fs", repos = "https://packagemanager.rstudio.com/all/2021-11-30+Y3JhbiwyOjQ1MjYyMTU7Q0UyMzFCQTg")
You can install it from the versioned repository
# Installation of the fs package in its version 1.50 because the following versions (.51 and .52) are buggy install.packages("fs", repos = "https://packagemanager.rstudio.com/all/2021-11-30+Y3JhbiwyOjQ1MjYyMTU7Q0UyMzFCQTg")
seems the link is invalid, instead
devtools::install_version('fs', '1.5.0')
I was unable to use devtools
to install version 1.5.0
as I first needed to remove the fs
package which then devtools needed to call devtools::install_versions()
I was able to install version 1.5.0 using
install.packages("https://cran.r-project.org/src/contrib/Archive/fs/fs_1.5.0.tar.gz",repos=NULL,type="source")
Are there any plans for this to be addressed? Unfortunately I don't have the skill to interact with the c++ code otherwise I'd give it a go myself.
Me again, I don't get it installed on R 4.3.0 - do you know a way, or have these issues been solved in the meantime with 1.6.2?
FWIW this seems to be "fixed" for the ucrt builds of R 4.3 (and probably 4.2?) on Windows, because that has UTF-8 as native encoding.
library(fs)
library(testthat)
twd <- path_temp(pattern = "dir-ls-reprex")
dir_create(twd)
owd <- setwd(twd)
cat("blah\n", file = "äçé")
(native <- list.files())
#> [1] "äçé"
(from_fs <- dir_ls())
#> äçé
Encoding(from_fs)
#> [1] "UTF-8"
(correct <- iconv(native, to = "UTF-8"))
#> [1] "äçé"
local({
local_edition(3)
expect_equal(
charToRaw(correct),
charToRaw(from_fs)
)
})
# dir_info()
fs::file_touch("bär")
dir()
#> [1] "äçé" "bär"
fs::dir_info()
#> # A tibble: 2 × 18
#> path type size permissions modification_time user group device_id
#> <fs::path> <fct> <fs::b> <fs::perms> <dttm> <chr> <chr> <dbl>
#> 1 bär file 0 rw- 2023-06-02 14:55:22 <NA> <NA> 2.22e9
#> 2 äçé file 6 rw- 2023-06-02 14:55:22 <NA> <NA> 2.22e9
#> # ℹ 10 more variables: hard_links <dbl>, special_device_id <dbl>, inode <dbl>,
#> # block_size <dbl>, blocks <dbl>, flags <int>, generation <dbl>,
#> # access_time <dttm>, change_time <dttm>, birth_time <dttm>
setwd(owd)
dir_delete(twd)
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.3.0 (2023-04-21 ucrt)
#> os Windows 10 x64 (build 19044)
#> system x86_64, mingw32
#> ui RTerm
#> language en
#> collate German_Germany.utf8
#> ctype German_Germany.utf8
#> tz Europe/Berlin
#> date 2023-06-02
#> pandoc 3.1.2 @ C:/PROGRA~1/Pandoc/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> brio 1.1.3 2021-11-30 [1] CRAN (R 4.3.0)
#> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0)
#> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0)
#> digest 0.6.31 2022-12-11 [1] CRAN (R 4.3.0)
#> evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.0)
#> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.0)
#> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)
#> fs * 1.6.2 2023-04-25 [1] CRAN (R 4.3.0)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0)
#> htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.3.0)
#> knitr 1.43 2023-05-25 [1] CRAN (R 4.3.0)
#> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
#> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
#> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.3.0)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.0)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.0)
#> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.0)
#> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.3.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
#> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.0)
#> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0)
#> rmarkdown 2.21 2023-03-26 [1] CRAN (R 4.3.0)
#> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.3.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
#> styler 1.10.0 2023-05-24 [1] CRAN (R 4.3.0)
#> testthat * 3.1.8 2023-05-04 [1] CRAN (R 4.3.0)
#> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
#> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0)
#> vctrs 0.6.2 2023-04-19 [1] CRAN (R 4.3.0)
#> waldo 0.5.1 2023-05-08 [1] CRAN (R 4.3.0)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.0)
#> xfun 0.39 2023-04-20 [1] CRAN (R 4.3.0)
#> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0)
#>
#> [1] C:/Users/Daniel/AppData/Local/R/win-library/4.3
#> [2] C:/Program Files/R/R-4.3.0/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
Perfect - thanks a lot.