readxl icon indicating copy to clipboard operation
readxl copied to clipboard

A read_excel variant or argument to read each excel column into a data frame column

Open moodymudskipper opened this issue 4 years ago • 2 comments

A curse of messy excel files is information stored in formatting.

An excel cell doesn't just have a value, it also has (or can have) :

  • a formula
  • a fill color
  • a font
  • a font color
  • a font size

And more, like border formatting, conditional formatting specifications, validation list specifications... but I believe the variables above are the one that we'd most want structured.

I believe all this info could be stored in data frame columns. We could have by default each column be a 6 column data frame containing value + variables above (if several fonts, colors, or sizes in a single cell, that'd be NA). Then we would unpack as desired using tidyr or filter such as in the following :

# read file, specify to include cell metadata
xl_df <- read_excel(path, include_metadata = TRUE)
#filter on red cells of variable `some_var`
dplyr::filter(xl_df, some_var$color == "#FF0000")

Alternately these data frame columns wouldn't be provided if they are default, which would allow us to just unpack everything, and only have color columns for variables that do have some colored cells for instance.

Alternately again, include_metadata could be a vector of colnames to import as data frame columns. include_metadata = c("var3", "var22")

Alternately again again, include_metadata could be a list of vectors enumerating the metadata to include. include_metadata = list(var3 = "fill", var22 = c("color", "font"))

moodymudskipper avatar Oct 04 '19 11:10 moodymudskipper

Yeah, this is on my radar and I'm building it into googlesheets4 from the very start as googlesheets4::sheets_cells().

For now, for Excel, I suggest you get similar functionality from tidyxl:

https://cran.r-project.org/package=tidyxl

jennybc avatar Oct 04 '19 15:10 jennybc

Great, and I didn"t know this package, thanks!

moodymudskipper avatar Oct 04 '19 18:10 moodymudskipper