readr icon indicating copy to clipboard operation
readr copied to clipboard

`read_csv()` reads wrong column when using `col_select` and `name_repair = "minimal"` in a file with duplicated column names

Open lucasnanni opened this issue 3 years ago • 1 comments

Given a csv file with duplicated column names, when I use read_csv() to read the file with the options name_repair = "minimal" and col_select set to include the second occurrence of the repeated column, the first occurrence is read instead.

In the reprex below I've created a csv table with only two columns, both named x. When I set name_repair = "minimal" and col_select = 2, the first column is read instead. Without the option name_repair = "minimal", the second column is read correctly.

tab <- I(
"x,x
a,1
b,2
c,3"
)

readr::read_csv(tab, col_select = 2, name_repair = "minimal")
#> Rows: 3 Columns: 1
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): x
#> dbl (1): x
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 3 × 1
#>   x    
#>   <chr>
#> 1 a    
#> 2 b    
#> 3 c

readr::read_csv(tab, col_select = 2)
#> New names:
#> Rows: 3 Columns: 1
#> ── Column specification
#> ──────────────────────────────────────────────────────── Delimiter: "," dbl
#> (1): x...2
#> ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
#> Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> • `x` -> `x...1`
#> • `x` -> `x...2`
#> # A tibble: 3 × 1
#>   x...2
#>   <dbl>
#> 1     1
#> 2     2
#> 3     3

Created on 2022-12-07 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.2 Patched (2022-11-10 r83330)
#>  os       Ubuntu 22.04.1 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    pt_BR.UTF-8
#>  tz       America/Sao_Paulo
#>  date     2022-12-07
#>  pandoc   2.9.2.1 @ /usr/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  bit           4.0.4   2020-08-04 [2] CRAN (R 4.2.1)
#>  bit64         4.0.5   2020-08-30 [2] CRAN (R 4.2.1)
#>  cli           3.4.1   2022-09-23 [2] CRAN (R 4.2.1)
#>  crayon        1.5.2   2022-09-29 [2] CRAN (R 4.2.1)
#>  digest        0.6.30  2022-10-18 [2] CRAN (R 4.2.1)
#>  ellipsis      0.3.2   2021-04-29 [2] CRAN (R 4.2.1)
#>  evaluate      0.18    2022-11-07 [2] CRAN (R 4.2.2)
#>  fansi         1.0.3   2022-03-24 [2] CRAN (R 4.2.1)
#>  fastmap       1.1.0   2021-01-25 [2] CRAN (R 4.2.1)
#>  fs            1.5.2   2021-12-08 [2] CRAN (R 4.2.1)
#>  glue          1.6.2   2022-02-24 [2] CRAN (R 4.2.1)
#>  highr         0.9     2021-04-16 [2] CRAN (R 4.2.1)
#>  hms           1.1.2   2022-08-19 [2] CRAN (R 4.2.1)
#>  htmltools     0.5.3   2022-07-18 [2] CRAN (R 4.2.1)
#>  knitr         1.40    2022-08-24 [2] CRAN (R 4.2.1)
#>  lifecycle     1.0.3   2022-10-07 [2] CRAN (R 4.2.1)
#>  magrittr      2.0.3   2022-03-30 [2] CRAN (R 4.2.1)
#>  pillar        1.8.1   2022-08-19 [2] CRAN (R 4.2.1)
#>  pkgconfig     2.0.3   2019-09-22 [2] CRAN (R 4.2.1)
#>  purrr         0.3.5   2022-10-06 [2] CRAN (R 4.2.1)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.2.1)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.2.1)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.2.1)
#>  R.utils       2.12.1  2022-10-30 [1] CRAN (R 4.2.1)
#>  R6            2.5.1   2021-08-19 [2] CRAN (R 4.2.1)
#>  readr         2.1.3   2022-10-01 [1] CRAN (R 4.2.2)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.2.2)
#>  rlang         1.0.6   2022-09-24 [2] CRAN (R 4.2.1)
#>  rmarkdown     2.18    2022-11-09 [2] CRAN (R 4.2.2)
#>  sessioninfo   1.2.2   2021-12-06 [2] CRAN (R 4.2.1)
#>  stringi       1.7.8   2022-07-11 [2] CRAN (R 4.2.1)
#>  stringr       1.4.1   2022-08-20 [2] CRAN (R 4.2.1)
#>  styler        1.8.0   2022-10-22 [1] CRAN (R 4.2.1)
#>  tibble        3.1.8   2022-07-22 [2] CRAN (R 4.2.1)
#>  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.2.1)
#>  tzdb          0.3.0   2022-03-28 [2] CRAN (R 4.2.1)
#>  utf8          1.2.2   2021-07-24 [2] CRAN (R 4.2.1)
#>  vctrs         0.5.1   2022-11-16 [2] CRAN (R 4.2.2)
#>  vroom         1.6.0   2022-09-30 [2] CRAN (R 4.2.1)
#>  withr         2.5.0   2022-03-03 [2] CRAN (R 4.2.1)
#>  xfun          0.34    2022-10-18 [2] CRAN (R 4.2.1)
#>  yaml          2.3.6   2022-10-18 [2] CRAN (R 4.2.1)
#> 
#>  [1] /home/nanni/R/x86_64-pc-linux-gnu-library/4.2
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

lucasnanni avatar Dec 08 '22 00:12 lucasnanni

Somewhat more minimal reprex:

tab <- "
x,x
a,1
b,2
c,3"

readr::read_csv(tab, col_select = 2, name_repair = "minimal", col_types = list())
#> # A tibble: 3 × 1
#>   x    
#>   <chr>
#> 1 a    
#> 2 b    
#> 3 c

readr::read_csv(tab, col_select = 2, col_types = list())
#> New names:
#> • `x` -> `x...1`
#> • `x` -> `x...2`
#> # A tibble: 3 × 1
#>   x...2
#>   <dbl>
#> 1     1
#> 2     2
#> 3     3

Created on 2023-07-31 with reprex v2.0.2

hadley avatar Jul 31 '23 22:07 hadley