xlconnect icon indicating copy to clipboard operation
xlconnect copied to clipboard

writeNamedRegion does not overwrite existing formula cells

Open martinstuder opened this issue 2 years ago • 3 comments

writeNamedRegion (and probably similarly writeWorksheet) won't overwrite existing formula cells (cells with cell type FORMULA). This seems to be a change in behavior after XLConnect 0.2-15 (possibly in underlying POI).

Consider the following reproducible example:

require(XLConnect)

# Create test data
wb <- loadWorkbook("repro.xlsx", create = TRUE)
createSheet(wb, "repro")
createName(wb, name = "repro", formula = "repro!$A$1")
data <- data.frame(A = 1, B = 0)
writeNamedRegion(wb, data = data, name = "repro")
setCellFormula(wb, sheet = "repro", row = 2, col = 2, formula = "A2 + 100")
saveWorkbook(wb)

# Quick check
wb <- loadWorkbook("repro.xlsx")
data <- readNamedRegion(wb, name = "repro")
stopifnot(identical(data, data.frame(A = 1, B = 101)))


# Reproducing the problem
newData <- data.frame(A = 888, B = 666)
wb <- loadWorkbook("repro.xlsx")
existingData <- readNamedRegion(wb, name = "repro")
writeNamedRegion(wb, data = newData, name = "repro")
saveWorkbook(wb, file = "new.xlsx")

wb <- loadWorkbook("new.xlsx")
writtenData <- readNamedRegion(wb, name = "repro")
stopifnot(identical(writtenData, data.frame(A = 888, B = 666)))

The above example (final check) passes with XLConnect 0.2-15 but it does not pass with XLConnect >= 1.0.0.

On the POI side, it is documented (at least since POI 3.17) that Cell.setCellValue "will set the precalculated value" for formula cells (see https://poi.apache.org/apidocs/3.17/org/apache/poi/ss/usermodel/Cell.html#setCellValue(double) or https://poi.apache.org/apidocs/dev/org/apache/poi/ss/usermodel/Cell.html#setCellValue-double- for current POI).

"precalculated" value seems to refer to cached values. And this seems indeed to be the case: if you use writtenData <- readNamedRegion(wb, name = "repro", useCachedValues = TRUE) the above passes with XLConnect >= 1.0.0. Alternatively, introducing a clearNamedRegion(wb, name = "repro") before overwriting the data makes the above pass since clearNamedRegion effectively removes the cells.

I see several possibilities how we could address this:

  • always clear cells before using setCellValue (Cell.setBlank should probably do the job) - this would forcibly overwrite formula cells always
  • introduce a new argument overwriteFormulaCells = TRUE to writeNamedRegion and writeWorksheet to control behavior as to whether formula cells should be cleared (setBlank) or not
  • introduce a new argument clearCells = TRUE to writeNamedRegion and writeWorksheet to control behavior as to whether cells should be cleared in general before writing - this is quite similar to the above with the difference that a clear would always be done if set to TRUE (note, however, that this is not equivalent to issuing a clearNamedRegion since clearNamedRegion would always clear the full named region while writeNamedRegion(..., clearCells=TRUE) would only clear the portion of the named region that is effectively being overwritten)

Option 2 seems to more directly express the purpose of the operation while option 3 seems to be closer to the existing clearNamedRegion. I'm also open for other options.

martinstuder avatar Aug 18 '22 20:08 martinstuder

Since this behavior has been in place for a while, I'm hesitant to make the default TRUE. I'm more in favor of option 2 as it is clearer why it would be needed - according to the doc, other types of cells would be set to a new value anyway, correct ?

spoltier avatar Aug 22 '22 06:08 spoltier

@spoltier

other types of cells would be set to a new value anyway, correct ?

Yes that is my understanding. The question is what users expect in terms of default behavior. I think my default expectation is that the data I provide overwrites any cells independent of their type. I think it is useful to be able to keep formula cells though. That's why I would personally opt for option 2 with default TRUE. Interested to hear what others think. @Chargothrond @RolandASc

martinstuder avatar Aug 22 '22 18:08 martinstuder

My default expectation would also be that writeNamedRegion / writeWorkSheet would overwrite formula cells, especially since that used to be the behavior.

Option 2 makes most sense to me as well, with the option to set overwriteFormulaCells to FALSE (i.e. default TRUE).

Chargothrond avatar Aug 23 '22 09:08 Chargothrond