openxlsx icon indicating copy to clipboard operation
openxlsx copied to clipboard

Do not allow `NA` as column name

Open jmbarbone opened this issue 3 years ago • 5 comments

Resolves #292

@JanMarvin do you know how we can tell if the NA is converted to "NA" in the sheet data?

The change is in writeData so it should have an effect in buildWorkbook(). Just couldn't get my head around what writeData() and Workbook$writeData() were doing.

jmbarbone avatar Jan 11 '22 15:01 jmbarbone

Codecov Report

Merging #316 (19f97da) into master (f4bf841) will increase coverage by 0.00%. The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #316   +/-   ##
=======================================
  Coverage   67.70%   67.71%           
=======================================
  Files          34       34           
  Lines        8897     8898    +1     
=======================================
+ Hits         6024     6025    +1     
  Misses       2873     2873           
Impacted Files Coverage Δ
R/writeData.R 88.39% <100.00%> (+0.06%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5e44c5e...19f97da. Read the comment docs.

codecov-commenter avatar Jan 11 '22 15:01 codecov-commenter

You just want to know, if the string is imported as "NA"? If written, is is converted into a sharedString (<sst><si /></sst> in openxml). Therefore you could check if wb$sharedString contains "<si><t xml:space=\"preserve\">NA</t></si>".

The position in the shared string is a 0 index, matching the value of v + 1 in wb$worksheets[[1]]$sheet_data$v where the type is 1 in wb$worksheets[[1]]$sheet_data$t.

I know that feeling in writeData and friends. The construction of the shared string and the lists are deeply embedded in openxlsx and make it near impossible to really implement things like inlineStr.

JanMarvin avatar Jan 11 '22 18:01 JanMarvin

You could look into make.names() to create unique names, it is used in the package at various places, otherwise you might end up with multiple "NA"s which brings other problems.

JanMarvin avatar Jan 11 '22 18:01 JanMarvin

You could look into make.names() to create unique names, it is used in the package at various places, otherwise you might end up with multiple "NA"s which brings other problems.

Considered that. Not sure if it's that's to best. There's no requirement for data in sheets to be unique, so the "NA" seemed like a less invasive fix. make.names() is used for the reading, which makes it difficult to test this (thought I could make the workbook and then read it to make sure it was okay). Technically you can still have a data.frame() in R without unique names, it's just a bad idea:

x <- data.frame(x = 1)
x <- cbind(x, x)
x
#>   x x
#> 1 1 1
str(x)
#> 'data.frame':    1 obs. of  2 variables:
#>  $ x: num 1
#>  $ x: num 1

Created on 2022-01-11 by the reprex package (v2.0.1)

I've seen plenty of workbooks with repeated column names. Maybe something a little nicer?

x <- c(NA, "a", "b", NA, "b", "a", NA)

make.names(x)
#> [1] "NA." "a"   "b"   "NA." "b"   "a"   "NA."
make.names(x, unique = TRUE)
#> [1] "NA."   "a"     "b"     "NA..1" "b.1"   "a.1"   "NA..2"

clean_na_names <- function(x) {
  w <- which(is.na(x))
  for (i in w) {
    x[w] <- paste0("NA..", w)
  }
  x
}

clean_na_names(x)
#> [1] "NA..1" "a"     "b"     "NA..4" "b"     "a"     "NA..7"

Created on 2022-01-11 by the reprex package (v2.0.1)

What I'm worried about is disharmony between reading and writing though. Maybe warnings/controls for reading a workbook as a data.frame with duplicated column names?

jmbarbone avatar Jan 11 '22 19:01 jmbarbone

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar Jan 14 '23 02:01 github-actions[bot]