fs icon indicating copy to clipboard operation
fs copied to clipboard

`dir_ls()` trips over non-ascii file names when native encoding isn't UTF-8

Open jennybc opened this issue 3 years ago • 12 comments

Discovered while studying https://github.com/tidyverse/readr/issues/1345.

dir_map() (on the C/C++ side) seems to assume libuv is giving it UTF-8 paths, but that's not true on Windows (where I made this reprex).

library(fs)
library(testthat)

twd <- path_temp(pattern = "dir-ls-reprex")
dir_create(twd)
owd <- setwd(twd)

cat("blah\n", file = "äçé")

(native <- list.files())
#> [1] "äçé"
(from_fs <- dir_ls())
#> äçé

# this mojibake is erroneously marked as UTF-8
Encoding(from_fs)
#> [1] "UTF-8"

# here are the bytes I'm expecting
(correct <- iconv(native, to = "UTF-8"))
#> [1] "äçé"

local({
  local_edition(3)
  expect_equal(
    charToRaw(correct),
    charToRaw(from_fs)
  )
})
#> Error: charToRaw(correct) (`actual`) not equal to charToRaw(from_fs) (`expected`).
#> 
#> `actual`:   "c3" "a4" "c3" "a7" "c3" "a9"                     and 2 more...
#> `expected`: "c3" "83" "c2" "a4" "c3" "83" "c2" "a7" "c3" "83" ...

setwd(owd)
dir_delete(twd)
#> Error: [ENOENT] Failed to remove 'C:/Users/jenny/AppData/Local/Temp/RtmpeEIdP0/dir-ls-reprex/äçé': no such file or directory

Created on 2022-01-03 by the reprex package (v2.0.1)

For some meta points, I think the dir_ls() bug is also why I can't use fs to delete the temp directory in this reprex.

jennybc avatar Jan 04 '22 02:01 jennybc

Yes I bumped into this too (version 1.5.2). version 1.5.1 does not work either but 1.5.0 does not have this problem

billy34 avatar Jan 18 '22 17:01 billy34

The same problem exists for me when working with polish characters in filenames [Windows 10, fs package v1.5.2]. Fortunately, it works with v1.5.0.

hbaniecki avatar Jan 18 '22 21:01 hbaniecki

Probably not surprisingly, but the same goes for dir_info()

library(fs)
fs::file_touch("bär")
dir()
#> [1] "bär"                       "well-mice_reprex.R"       
#> [3] "well-mice_reprex.spin.R"   "well-mice_reprex.spin.Rmd"
fs::dir_info()
#> # A tibble: 4 x 18
#>   path         type   size permissions modification_time   user  group device_id
#>   <fs::path>   <fct> <fs:> <fs::perms> <dttm>              <chr> <chr>     <dbl>
#> 1 bär         <NA>     NA ---         NA                  <NA>  <NA>    NA     
#> 2 ~ce_reprex.R file    227 rw-         2022-02-03 20:06:41 <NA>  <NA>     2.22e9
#> 3 ~prex.spin.R file    227 rw-         2022-02-03 20:06:43 <NA>  <NA>     2.22e9
#> 4 ~ex.spin.Rmd file    855 rw-         2022-02-03 20:06:43 <NA>  <NA>     2.22e9
#> # ... with 10 more variables: hard_links <dbl>, special_device_id <dbl>,
#> #   inode <dbl>, block_size <dbl>, blocks <dbl>, flags <int>, generation <dbl>,
#> #   access_time <dttm>, change_time <dttm>, birth_time <dttm>
Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       Windows 10 x64 (build 19043)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language en
#>  collate  German_Germany.1252
#>  ctype    German_Germany.1252
#>  tz       Europe/Berlin
#>  date     2022-02-03
#>  pandoc   2.17.1.1 @ C:/PROGRA~1/Pandoc/ (via rmarkdown)
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version    date (UTC) lib source
#>  backports     1.4.1      2021-12-13 [1] CRAN (R 4.1.2)
#>  cli           3.1.1      2022-01-20 [1] CRAN (R 4.1.2)
#>  crayon        1.4.2      2021-10-29 [1] CRAN (R 4.1.1)
#>  digest        0.6.29     2021-12-01 [1] CRAN (R 4.1.2)
#>  ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.1.0)
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 4.1.0)
#>  fansi         1.0.2      2022-01-14 [1] CRAN (R 4.1.2)
#>  fastmap       1.1.0      2021-01-25 [1] CRAN (R 4.1.0)
#>  fs          * 1.5.2.9000 2022-02-03 [1] Github (r-lib/fs@6d1182f)
#>  glue          1.6.1      2022-01-22 [1] CRAN (R 4.1.2)
#>  highr         0.9        2021-04-16 [1] CRAN (R 4.1.0)
#>  htmltools     0.5.2      2021-08-25 [1] CRAN (R 4.1.1)
#>  knitr         1.37       2021-12-16 [1] CRAN (R 4.1.2)
#>  lifecycle     1.0.1      2021-09-24 [1] CRAN (R 4.1.1)
#>  magrittr      2.0.2      2022-01-26 [1] CRAN (R 4.1.2)
#>  pillar        1.6.5      2022-01-25 [1] CRAN (R 4.1.2)
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.1.0)
#>  purrr         0.3.4      2020-04-17 [1] CRAN (R 4.1.0)
#>  R.cache       0.15.0     2021-04-30 [1] CRAN (R 4.1.0)
#>  R.methodsS3   1.8.1      2020-08-26 [1] CRAN (R 4.1.0)
#>  R.oo          1.24.0     2020-08-26 [1] CRAN (R 4.1.0)
#>  R.utils       2.11.0     2021-09-26 [1] CRAN (R 4.1.1)
#>  reprex        2.0.1      2021-08-05 [1] CRAN (R 4.1.0)
#>  rlang         1.0.0      2022-01-26 [1] CRAN (R 4.1.2)
#>  rmarkdown     2.11       2021-09-14 [1] CRAN (R 4.1.1)
#>  rstudioapi    0.13       2020-11-12 [1] CRAN (R 4.1.0)
#>  sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.1.2)
#>  stringi       1.7.6      2021-11-29 [1] CRAN (R 4.1.2)
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 4.1.0)
#>  styler        1.6.2      2021-09-23 [1] CRAN (R 4.1.1)
#>  tibble        3.1.6      2021-11-07 [1] CRAN (R 4.1.2)
#>  utf8          1.2.2      2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs         0.3.8      2021-04-29 [1] CRAN (R 4.1.0)
#>  withr         2.4.3      2021-11-30 [1] CRAN (R 4.1.2)
#>  xfun          0.29       2021-12-14 [1] CRAN (R 4.1.2)
#>  yaml          2.2.2      2022-01-25 [1] CRAN (R 4.1.2)
#> 
#>  [1] C:/Users/Daniel.AK-HAMBURG/Documents/R/win-library/4.1
#>  [2] C:/Program Files/R/R-4.1.2/library
#> 
#> ------------------------------------------------------------------------------

dpprdan avatar Feb 03 '22 19:02 dpprdan

I ran into the same issue today with some folders named after the months in german. March is called "März" and dir_ls() throws an error: Error: [ENOENT] Failed to search directory 'C:/some_folder/year_2022/month_März': no such file or directory

I just updated to version 1.5.2 and didn't have those issues before.

As there is already plenty of code explaining the issue i decided to not provide any more. Sorry!

FlorianMyronStork avatar Mar 07 '22 16:03 FlorianMyronStork

Sorry for off-topic: I'm having the same issues, but I can't compile v1.5.0 so that my code works. Any suggestions how to get it done? Win10 machine here.

Or someone "simply" fixes this bug ;)

matk111 avatar Apr 13 '22 15:04 matk111

You can install it from the versioned repository

# Installation of the fs package in its version 1.50 because the following versions (.51 and .52) are buggy
install.packages("fs", repos = "https://packagemanager.rstudio.com/all/2021-11-30+Y3JhbiwyOjQ1MjYyMTU7Q0UyMzFCQTg")

billy34 avatar Apr 13 '22 15:04 billy34

You can install it from the versioned repository

# Installation of the fs package in its version 1.50 because the following versions (.51 and .52) are buggy
install.packages("fs", repos = "https://packagemanager.rstudio.com/all/2021-11-30+Y3JhbiwyOjQ1MjYyMTU7Q0UyMzFCQTg")

seems the link is invalid, instead

devtools::install_version('fs', '1.5.0')

s609078902 avatar May 16 '22 01:05 s609078902

I was unable to use devtools to install version 1.5.0 as I first needed to remove the fs package which then devtools needed to call devtools::install_versions()

I was able to install version 1.5.0 using

install.packages("https://cran.r-project.org/src/contrib/Archive/fs/fs_1.5.0.tar.gz",repos=NULL,type="source")

Mikea0228 avatar Aug 04 '22 15:08 Mikea0228

Are there any plans for this to be addressed? Unfortunately I don't have the skill to interact with the c++ code otherwise I'd give it a go myself.

Mikea0228 avatar Mar 14 '23 17:03 Mikea0228

Me again, I don't get it installed on R 4.3.0 - do you know a way, or have these issues been solved in the meantime with 1.6.2?

matk111 avatar Jun 02 '23 12:06 matk111

FWIW this seems to be "fixed" for the ucrt builds of R 4.3 (and probably 4.2?) on Windows, because that has UTF-8 as native encoding.

library(fs)
library(testthat)

twd <- path_temp(pattern = "dir-ls-reprex")
dir_create(twd)
owd <- setwd(twd)

cat("blah\n", file = "äçé")

(native <- list.files())
#> [1] "äçé"
(from_fs <- dir_ls())
#> äçé
Encoding(from_fs)
#> [1] "UTF-8"
(correct <- iconv(native, to = "UTF-8"))
#> [1] "äçé"
local({
  local_edition(3)
  expect_equal(
    charToRaw(correct),
    charToRaw(from_fs)
  )
})

# dir_info()
fs::file_touch("bär")
dir()
#> [1] "äçé" "bär"
fs::dir_info()
#> # A tibble: 2 × 18
#>   path       type     size permissions modification_time   user  group device_id
#>   <fs::path> <fct> <fs::b> <fs::perms> <dttm>              <chr> <chr>     <dbl>
#> 1 bär        file        0 rw-         2023-06-02 14:55:22 <NA>  <NA>     2.22e9
#> 2 äçé        file        6 rw-         2023-06-02 14:55:22 <NA>  <NA>     2.22e9
#> # ℹ 10 more variables: hard_links <dbl>, special_device_id <dbl>, inode <dbl>,
#> #   block_size <dbl>, blocks <dbl>, flags <int>, generation <dbl>,
#> #   access_time <dttm>, change_time <dttm>, birth_time <dttm>

setwd(owd)
dir_delete(twd)
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.0 (2023-04-21 ucrt)
#>  os       Windows 10 x64 (build 19044)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language en
#>  collate  German_Germany.utf8
#>  ctype    German_Germany.utf8
#>  tz       Europe/Berlin
#>  date     2023-06-02
#>  pandoc   3.1.2 @ C:/PROGRA~1/Pandoc/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  brio          1.1.3   2021-11-30 [1] CRAN (R 4.3.0)
#>  cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
#>  crayon        1.5.2   2022-09-29 [1] CRAN (R 4.3.0)
#>  digest        0.6.31  2022-12-11 [1] CRAN (R 4.3.0)
#>  evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
#>  fansi         1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
#>  fs          * 1.6.2   2023-04-25 [1] CRAN (R 4.3.0)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
#>  htmltools     0.5.5   2023-03-23 [1] CRAN (R 4.3.0)
#>  knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
#>  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
#>  purrr         1.0.1   2023-01-10 [1] CRAN (R 4.3.0)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.3.0)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.3.0)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.3.0)
#>  R.utils       2.12.2  2022-11-11 [1] CRAN (R 4.3.0)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.3.0)
#>  rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
#>  rmarkdown     2.21    2023-03-26 [1] CRAN (R 4.3.0)
#>  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.3.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
#>  styler        1.10.0  2023-05-24 [1] CRAN (R 4.3.0)
#>  testthat    * 3.1.8   2023-05-04 [1] CRAN (R 4.3.0)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
#>  utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
#>  vctrs         0.6.2   2023-04-19 [1] CRAN (R 4.3.0)
#>  waldo         0.5.1   2023-05-08 [1] CRAN (R 4.3.0)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
#>  xfun          0.39    2023-04-20 [1] CRAN (R 4.3.0)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
#> 
#>  [1] C:/Users/Daniel/AppData/Local/R/win-library/4.3
#>  [2] C:/Program Files/R/R-4.3.0/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

dpprdan avatar Jun 02 '23 13:06 dpprdan

Perfect - thanks a lot.

matk111 avatar Jun 02 '23 14:06 matk111