openxlsx
openxlsx copied to clipboard
Do not allow `NA` as column name
Resolves #292
@JanMarvin do you know how we can tell if the NA
is converted to "NA"
in the sheet data?
The change is in writeData
so it should have an effect in buildWorkbook()
. Just couldn't get my head around what writeData()
and Workbook$writeData()
were doing.
Codecov Report
Merging #316 (19f97da) into master (f4bf841) will increase coverage by
0.00%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #316 +/- ##
=======================================
Coverage 67.70% 67.71%
=======================================
Files 34 34
Lines 8897 8898 +1
=======================================
+ Hits 6024 6025 +1
Misses 2873 2873
Impacted Files | Coverage Δ | |
---|---|---|
R/writeData.R | 88.39% <100.00%> (+0.06%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 5e44c5e...19f97da. Read the comment docs.
You just want to know, if the string is imported as "NA"? If written, is is converted into a sharedString
(<sst><si /></sst>
in openxml). Therefore you could check if wb$sharedString
contains "<si><t xml:space=\"preserve\">NA</t></si>"
.
The position in the shared string is a 0 index, matching the value of v + 1
in wb$worksheets[[1]]$sheet_data$v
where the type is 1
in wb$worksheets[[1]]$sheet_data$t
.
I know that feeling in writeData
and friends. The construction of the shared string and the lists are deeply embedded in openxlsx
and make it near impossible to really implement things like inlineStr
.
You could look into make.names()
to create unique names, it is used in the package at various places, otherwise you might end up with multiple "NA"s which brings other problems.
You could look into
make.names()
to create unique names, it is used in the package at various places, otherwise you might end up with multiple "NA"s which brings other problems.
Considered that. Not sure if it's that's to best. There's no requirement for data in sheets to be unique, so the "NA"
seemed like a less invasive fix. make.names()
is used for the reading, which makes it difficult to test this (thought I could make the workbook and then read it to make sure it was okay). Technically you can still have a data.frame()
in R without unique names, it's just a bad idea:
x <- data.frame(x = 1)
x <- cbind(x, x)
x
#> x x
#> 1 1 1
str(x)
#> 'data.frame': 1 obs. of 2 variables:
#> $ x: num 1
#> $ x: num 1
Created on 2022-01-11 by the reprex package (v2.0.1)
I've seen plenty of workbooks with repeated column names. Maybe something a little nicer?
x <- c(NA, "a", "b", NA, "b", "a", NA)
make.names(x)
#> [1] "NA." "a" "b" "NA." "b" "a" "NA."
make.names(x, unique = TRUE)
#> [1] "NA." "a" "b" "NA..1" "b.1" "a.1" "NA..2"
clean_na_names <- function(x) {
w <- which(is.na(x))
for (i in w) {
x[w] <- paste0("NA..", w)
}
x
}
clean_na_names(x)
#> [1] "NA..1" "a" "b" "NA..4" "b" "a" "NA..7"
Created on 2022-01-11 by the reprex package (v2.0.1)
What I'm worried about is disharmony between reading and writing though. Maybe warnings/controls for reading a workbook as a data.frame
with duplicated column names?
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.