readr
readr copied to clipboard
`read_csv()` reads wrong column when using `col_select` and `name_repair = "minimal"` in a file with duplicated column names
Given a csv file with duplicated column names, when I use read_csv() to read the file with the options name_repair = "minimal" and col_select set to include the second occurrence of the repeated column, the first occurrence is read instead.
In the reprex below I've created a csv table with only two columns, both named x. When I set name_repair = "minimal" and col_select = 2, the first column is read instead. Without the option name_repair = "minimal", the second column is read correctly.
tab <- I(
"x,x
a,1
b,2
c,3"
)
readr::read_csv(tab, col_select = 2, name_repair = "minimal")
#> Rows: 3 Columns: 1
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): x
#> dbl (1): x
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 3 × 1
#> x
#> <chr>
#> 1 a
#> 2 b
#> 3 c
readr::read_csv(tab, col_select = 2)
#> New names:
#> Rows: 3 Columns: 1
#> ── Column specification
#> ──────────────────────────────────────────────────────── Delimiter: "," dbl
#> (1): x...2
#> ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
#> Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> • `x` -> `x...1`
#> • `x` -> `x...2`
#> # A tibble: 3 × 1
#> x...2
#> <dbl>
#> 1 1
#> 2 2
#> 3 3
Created on 2022-12-07 with reprex v2.0.2
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.2.2 Patched (2022-11-10 r83330)
#> os Ubuntu 22.04.1 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype pt_BR.UTF-8
#> tz America/Sao_Paulo
#> date 2022-12-07
#> pandoc 2.9.2.1 @ /usr/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> bit 4.0.4 2020-08-04 [2] CRAN (R 4.2.1)
#> bit64 4.0.5 2020-08-30 [2] CRAN (R 4.2.1)
#> cli 3.4.1 2022-09-23 [2] CRAN (R 4.2.1)
#> crayon 1.5.2 2022-09-29 [2] CRAN (R 4.2.1)
#> digest 0.6.30 2022-10-18 [2] CRAN (R 4.2.1)
#> ellipsis 0.3.2 2021-04-29 [2] CRAN (R 4.2.1)
#> evaluate 0.18 2022-11-07 [2] CRAN (R 4.2.2)
#> fansi 1.0.3 2022-03-24 [2] CRAN (R 4.2.1)
#> fastmap 1.1.0 2021-01-25 [2] CRAN (R 4.2.1)
#> fs 1.5.2 2021-12-08 [2] CRAN (R 4.2.1)
#> glue 1.6.2 2022-02-24 [2] CRAN (R 4.2.1)
#> highr 0.9 2021-04-16 [2] CRAN (R 4.2.1)
#> hms 1.1.2 2022-08-19 [2] CRAN (R 4.2.1)
#> htmltools 0.5.3 2022-07-18 [2] CRAN (R 4.2.1)
#> knitr 1.40 2022-08-24 [2] CRAN (R 4.2.1)
#> lifecycle 1.0.3 2022-10-07 [2] CRAN (R 4.2.1)
#> magrittr 2.0.3 2022-03-30 [2] CRAN (R 4.2.1)
#> pillar 1.8.1 2022-08-19 [2] CRAN (R 4.2.1)
#> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.2.1)
#> purrr 0.3.5 2022-10-06 [2] CRAN (R 4.2.1)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.1)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.1)
#> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.1)
#> R.utils 2.12.1 2022-10-30 [1] CRAN (R 4.2.1)
#> R6 2.5.1 2021-08-19 [2] CRAN (R 4.2.1)
#> readr 2.1.3 2022-10-01 [1] CRAN (R 4.2.2)
#> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.2)
#> rlang 1.0.6 2022-09-24 [2] CRAN (R 4.2.1)
#> rmarkdown 2.18 2022-11-09 [2] CRAN (R 4.2.2)
#> sessioninfo 1.2.2 2021-12-06 [2] CRAN (R 4.2.1)
#> stringi 1.7.8 2022-07-11 [2] CRAN (R 4.2.1)
#> stringr 1.4.1 2022-08-20 [2] CRAN (R 4.2.1)
#> styler 1.8.0 2022-10-22 [1] CRAN (R 4.2.1)
#> tibble 3.1.8 2022-07-22 [2] CRAN (R 4.2.1)
#> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.1)
#> tzdb 0.3.0 2022-03-28 [2] CRAN (R 4.2.1)
#> utf8 1.2.2 2021-07-24 [2] CRAN (R 4.2.1)
#> vctrs 0.5.1 2022-11-16 [2] CRAN (R 4.2.2)
#> vroom 1.6.0 2022-09-30 [2] CRAN (R 4.2.1)
#> withr 2.5.0 2022-03-03 [2] CRAN (R 4.2.1)
#> xfun 0.34 2022-10-18 [2] CRAN (R 4.2.1)
#> yaml 2.3.6 2022-10-18 [2] CRAN (R 4.2.1)
#>
#> [1] /home/nanni/R/x86_64-pc-linux-gnu-library/4.2
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
Somewhat more minimal reprex:
tab <- "
x,x
a,1
b,2
c,3"
readr::read_csv(tab, col_select = 2, name_repair = "minimal", col_types = list())
#> # A tibble: 3 × 1
#> x
#> <chr>
#> 1 a
#> 2 b
#> 3 c
readr::read_csv(tab, col_select = 2, col_types = list())
#> New names:
#> • `x` -> `x...1`
#> • `x` -> `x...2`
#> # A tibble: 3 × 1
#> x...2
#> <dbl>
#> 1 1
#> 2 2
#> 3 3
Created on 2023-07-31 with reprex v2.0.2