readxl
readxl copied to clipboard
If no col name, fallback to Excel's letter-based column names
When an excel file doesn't have Column Names, the assigned colnames of the df being the same as that of the excel file. That would also mean that if we're starting from E5 cell, then the returned dataframe's first column name should also be "e". This way, for instance, when multiple people work on the same excel from a different angle (1 directly and 1 on R), they can easily refer to columns.
Yo, Gianni... remove column F
What happens at the moment
readxl::read_excel(
"SpreadSheet.xlsx",
range = cell_limits(c(2, 5), c(NA, NA))
) %>%
select(-X__2)
x__1 | x__3 |
---|---|
2 | 2 |
3 | 3 |
What I suggest:
readxl::read_excel(
"SpreadSheet.xlsx",
col_names = FALSE,
use_excel_col_names = TRUE,
range = cell_limits(c(2, 5), c(NA, NA))
) %>%
select(-f)
or
readxl::read_excel(
"SpreadSheet.xlsx",
col_names = "excel",
range = cell_limits(c(2, 5), c(NA, NA))
) %>%
select(-f)
e | g |
---|---|
2 | 2 |
3 | 3 |
Much easier to understand which columns you're looking at and compare with the excel file.
A lightweight solution for this in "user space" is to pass a custom function to the new .name_repair
argument in the dev version of readxl.
devtools::load_all(".")
#> Loading readxl
## readxl has a test sheet with missing column names
## this is an unexported helper that makes test sheets available
read_excel(test_sheet("unnamed-duplicated-columns.xlsx"))
#> New names:
#> * `` -> `..1`
#> * var2 -> var2..2
#> * `` -> `..3`
#> * var2 -> var2..4
#> # A tibble: 2 x 4
#> ..1 var2..2 ..3 var2..4
#> <dbl> <chr> <dbl> <chr>
#> 1 1 a 1.1 aa
#> 2 2 b 2.1 bb
LETTER_names <- function(nms) LETTERS[seq_along(nms)]
read_excel(
test_sheet("unnamed-duplicated-columns.xlsx"),
.name_repair = LETTER_names
)
#> # A tibble: 2 x 4
#> A B C D
#> <dbl> <chr> <dbl> <chr>
#> 1 1 a 1.1 aa
#> 2 2 b 2.1 bb
## same, but via anonymous function
read_excel(
test_sheet("unnamed-duplicated-columns.xlsx"),
.name_repair = ~ LETTERS[seq_along(.x)]
)
#> # A tibble: 2 x 4
#> A B C D
#> <dbl> <chr> <dbl> <chr>
#> 1 1 a 1.1 aa
#> 2 2 b 2.1 bb
Created on 2018-12-13 by the reprex package (v0.2.1.9000)
The bummer is that, in user space, you don't know where the data was found in the sheet, i.e. if empty columns were skipped. I'll have to ponder what would be necessary to do this properly and whether it feels worth it.