openxlsx icon indicating copy to clipboard operation
openxlsx copied to clipboard

write formula fails when formula.tools package is loaded

Open klin333 opened this issue 3 years ago • 8 comments

Describe the bug When the formula.tools package is loaded in namespace, openxlsx writing of excel sheets that include excel formula fails. Excel complains of "we found a problem with some contents of ..."

To Reproduce

library(openxlsx)

make_wb <- function() {
  df <- data.frame(
    x = 1, y = 2,
    z = "A2 + B2",
    stringsAsFactors = FALSE
  )
  
  class(df$z) <- c(class(df$z), "formula")
  
  wb <- createWorkbook()
  addWorksheet(wb, "Sheet1")
  writeData(wb, sheet = "Sheet1", x = df)
  
  wb
}

wb <- make_wb()
openxlsx::saveWorkbook(wb, file = 'works.xlsx', overwrite = TRUE)
print(sessionInfo())
# R version 3.6.3 (2020-02-29)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 10 x64 (build 19044)
# 
# Matrix products: default
# 
# locale:
#   [1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
# [5] LC_TIME=English_Australia.1252    
# 
# attached base packages:
#   [1] graphics  grDevices utils     methods   base     
# 
# other attached packages:
#   [1] openxlsx_4.2.5
# 
# loaded via a namespace (and not attached):
#   [1] BiocManager_1.30.10 compiler_3.6.3      tools_3.6.3         Rcpp_1.0.4.6        stats_3.6.3         stringi_1.4.6       zip_2.1.1          
# [8] packrat_0.5.0       renv_0.13.2   

loadNamespace('formula.tools')
wb <- make_wb()
openxlsx::saveWorkbook(wb, file = 'error.xlsx', overwrite = TRUE)
print(sessionInfo())
# R version 3.6.3 (2020-02-29)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 10 x64 (build 19044)
# 
# Matrix products: default
# 
# locale:
#   [1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
# [5] LC_TIME=English_Australia.1252    
# 
# attached base packages:
#   [1] graphics  grDevices utils     methods   base     
# 
# other attached packages:
#   [1] openxlsx_4.2.5
# 
# loaded via a namespace (and not attached):
#   [1] BiocManager_1.30.10  compiler_3.6.3       tools_3.6.3          Rcpp_1.0.4.6         stats_3.6.3          stringi_1.4.6       
# [7] zip_2.1.1            packrat_0.5.0        operator.tools_1.6.3 renv_0.13.2          formula.tools_1.7.1 

works.xlsx error.xlsx

Likely cause

formula.tools has a as.character.formula function which changes the behaviour of as.character(formula_variable). It's caused me bugs in other places so maybe it's more a problem of just bad formla.tools package. To improve openxlsx to be more robust, we could explicitly use as.character.default in brittle places. However, if authors don't see the value in debugging this, might be good to notify users somehow of the existance of this problem?

klin333 avatar Jan 11 '22 04:01 klin333

I'd say it is not really worth digging into this. Would you want to create a pull request? Maybe print an error if the package namespace is loaded or try to make it bullet proof that whatever we expect to be a formula behaves as such? Guess we someplace use as.character() on a formula.

# with formula.tools
> paste(df$z)
[1] "A2 + B2"
# what we currently write (and libreoffice sees as formula in the sheet)
> as.character(df$z)
[1] "structure(\"A2 + B2\", class = \"formula\")"

JanMarvin avatar Jan 11 '22 11:01 JanMarvin

I think we could try to have a openxlsx::as.character.formula() function which just returns the value. Namespace conflicts would then be obvious when loading but internal functions, I believe, would still work fine.

Might actually be more of a problem with this package. Would probably be better to use more explicit class names like "workbookFormula" than just "formula".

jmbarbone avatar Jan 11 '22 14:01 jmbarbone

Like I've said, I don't really see why we should deal with the shenanigans of some other package modifying established base classes, but if there's some simple way to handle this, I'm not vetoing changes and it looks like you're already on it 😃

JanMarvin avatar Jan 11 '22 14:01 JanMarvin

actually i would really recommend against adding as.character.formula() function to this package. because as JanMarvin said, it's not good to modify established base classes. formula is a very common base class. adding a as.character.formula function would cause many hard-to-find bugs in user land.

The fact that formula.tools package had this as.character.formula function caused me so many bugs, this issue included, that i stripped out dependence on formula.tools altogether. i would really not like having this same problem with openxlsx.

i'm happy for no action to be taken. ideally openxlsx shouldn't use 'formula' class as it clashes with the established formula class in base R. if we can't be bothered changing openxlsx to use let's say workbookFormula as jmbarbone suggested, the next best thing is a warning somewhere, else no action taken. Anything is better than adding in the trouble making as.character.formula

klin333 avatar Jan 11 '22 15:01 klin333

openxlsx::as.character.formula(x) simply returns ~x~ as.character.default(x). There's no other modification done that wouldn't also be done without the method, and as.character.formula(), as far as I can tell, doesn't exist anywhere else. By exporting it will make conflicts explicit and easier to debug.

unloadNamespace("openxlsx")
form <- y ~ x1 * x2

identical(
  as.character(form),
  openxlsx:::as.character.formula(form)
)
#> [1] TRUE

Created on 2022-01-11 by the reprex package (v2.0.1)

I agree that this isn't the best solution. Better would be to replace every check/use of "formula" with a better class name.

jmbarbone avatar Jan 11 '22 15:01 jmbarbone

Can we make it simply internal? Otherwise I fear that it's a lot of noise just for one of the many thousands of packages and even more, one I haven't heard of before today? @jmbarbone Doesn't this depend on the order of package loading? Appreciate your efforts as always! Edit: Maybe someone with more experience with the package and its implications (looking at you @klin333) should talk to the upstream package, maybe they are unaware of the implications it brings for other packages. Having written an estimator myself, I don't think I've guarded myself against such implications.

JanMarvin avatar Jan 11 '22 17:01 JanMarvin

@JanMarvin if it's internal it won't have the warn conflicts, I believe.

formula class are language and dealt with in as.character() in internal methods. I'm not sure #315 is correct with the comparisons.

The more I look I this the more I think the PR #315 should be abandoned and we should see if using workbookFormula is feasible (#317).

jmbarbone avatar Jan 11 '22 20:01 jmbarbone

@JanMarvin, I don't think the blame can be placed on formula.tools. It intentionally tries to alter the behaviour of the generic as.character() when applied to the usual base R 'formula' class (things like y ~ x1 + x2). This is probably a deficiency of the clunky object oriented model in R. Still, i didn't like it, so i simply stopped using formula.tools as there's just no way the package will change it, it's a crucial but bad 'feature' of that package.

So the problem with openxlsx is it uses the 'formula' class, which clashes with an extremely popular base R class. If formula.tools wasn't causing a generic clash, some other package out there could. It's the same as why you should never create a package that unneccesarily override base classes such as Date or Datetime.

Support quote from https://stat.ethz.ch/R-manual/R-devel/library/methods/html/className.html "Overriding a class definition in another package with the same name deliberately is usually a bad idea. Although R attempts to keep and use the two definitions (as of version 2.14.0), ambiguities are always possible. It is more sensible to define a new class that extends an existing class but has a different name."

This case is worse because openxlsx's 'formula' has nothing to do with the base R formula class.

klin333 avatar Jan 11 '22 21:01 klin333

This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jan 14 '23 02:01 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

github-actions[bot] avatar Jan 21 '23 02:01 github-actions[bot]