openxlsx icon indicating copy to clipboard operation
openxlsx copied to clipboard

Reading in file with formula shows values as 0 (unless the original file is opened in Excel and saved)

Open deschen1 opened this issue 4 years ago • 8 comments

I have an Excel file thta contains formulas. This file comes from a website. If I download the file and try to read in into R, it leads to a file showing the numeric values as "0". When I open the file with Excel and save it (without making any changes), and then run my code in R again, the file is read in correctly.

Note, the issue also appears when using readxl::read_xlsx so it's not necessarily an openxlsx bug. However, might be that both packages use similar reading functionalities and thus suffer from the same bug?

Excel file attached: New folder.zip

Code that doesn't work at first:

test <- openxlsx::readWorkbook(file.choose())

Reading in the unzipped file gives:

                                               Example    X2
1                                      Profit and Loss  <NA>
2                                    January 1-6, 2021  <NA>
3                                                 <NA> Total
4                                               Income  <NA>
5                                       4000 - Revenue  <NA>
6                        4100 - Revenue - Tree Service   0.0
7                                 4200 - Revenue - PHC   0.0
8                                 Total 4000 - Revenue   0.0
9                                         Total Income   0.0
10                                        Gross Profit   0.0
11                                            Expenses  <NA>
12                                      Total Expenses  <NA>
13                                Net Operating Income   0.0
14                                          Net Income   0.0
15 Friday, Sep 24, 2021 11:40:09 PM GMT-7 - Cash Basis  <NA>

Opening the file in Excel, saving it and running above code again gives:

                                               Example    X2
1                                      Profit and Loss  <NA>
2                                    January 1-6, 2021  <NA>
3                                                 <NA> Total
4                                               Income  <NA>
5                                       4000 - Revenue  <NA>
6                        4100 - Revenue - Tree Service  6000
7                                 4200 - Revenue - PHC   480
8                                 Total 4000 - Revenue  6480
9                                         Total Income  6480
10                                        Gross Profit  6480
11                                            Expenses  <NA>
12                                      Total Expenses  <NA>
13                                Net Operating Income  6480
14                                          Net Income  6480
15 Friday, Sep 24, 2021 11:40:09 PM GMT-7 - Cash Basis  <NA>

deschen1 avatar Sep 25 '21 11:09 deschen1

Hi @deschen1 , please see my answer here https://github.com/ycphs/openxlsx/issues/188#issuecomment-832591241. The issue is that people normally expect Excel formulas in sheets to be evaluated. This is not true though, most of the readers only read the XML sheets. Otherwise they'd have to build a library that mimics the entire Excel functions. From a programming point not something impossible, but presumably nothing anyone will tackle ever.

openxlsx reads something like the following (typing on smartphone 🙈 )

<c ...>
   <f>SUM(A1:BE)</f>
   <v>0</v>
</c>

We see a formula and value tag. Though nothing guarantees that the <v> entry is still valid. This was written by some other library, but there is no way for us to know if this is still the correct value. The only way would be to mimic the Excel functions and re-evaluate it. If you like Excel formulas just as much as every other guy, you know that's close to being impossible.

JanMarvin avatar Sep 25 '21 11:09 JanMarvin

I see, so unless someone enjoys rebuilding Excel in R, this will remain a "bug" inherent to all reading packages. Not being a developer, would it theoretically be possible to implement the workaround you suggested in this other thread? I.e. adding an option to the read-in functions of openxlsx that either asks the user if they want the file opened/saved and then to be read in. Or that it does this step automatically as soon as a formula is detected in the sheet (so if any cell starts with an =, then the underlying Excel file is opened/saved (temporarily) and then read in?

This issues post is based on a discussion on Stackoverflow, and there was a workaround where someone used the reticualte package and some python code to open/save the Excel files, but maybe this is also possible with plain R?

Such a workaround would make sure the formula values are correct whenever the file is read-in to R.

deschen1 avatar Sep 27 '21 12:09 deschen1

I assume it will, though I have not tired it yet. In https://github.com/ycphs/openxlsx/issues/227#issuecomment-880735673 @vchouraki proposed a solution based on LibreOffice to pre-load and save the xlsx file. I haven't personally tested this approach, but I assume it works. This is based on a headless LibreOffice, therefore it depends whether or not LibreOffice can evaluate the formula (I assume they are feature complete for the basic stuff, but I'm not aware how or if they handle newer formulas such as MS 365).

Since Excel most likely doesn't have a headless mode, you'd have to work around this with powershell (on Windows) or swift (on Mac) if you want to achieve the same automation without python (if that's what bothers you).

However, I suspect there are at least three reasons why this was not enabled by default.

  • It depends on third party software Excel or LibreOffice
  • It slows things down (a bad reason dealing in a matter of correctness, but people would most likely be mad if they have to wait the time it took Excel/LibreOffice to load plus the time it takes the Sheet to be evaluated).
  • It will only guard you evaluating local sheets of the Workbook. Though Excel allows to link other sheets into your Sheets, which is why I'm kinda hesitant to call any sheet with formulas final.

Though I consider it bad practice for R packages (us included) to not warn the user about possible unevaluated formulas in the sheets.

JanMarvin avatar Sep 27 '21 14:09 JanMarvin

Sorry for the late reply. I just encountered that the openxlsx package has this `openXL´ function, which automatically opens an Excel file in the system's default tool (MS Excel here). So wouldn't it be possible from here to also hav a function that automatically saves/closes the file? I mean, it's probably not that trivial, because opening is easy, but how would you be able to access the save/close parts of Excel from within R?

I can't say much about the other things you mentioned and what could be the side-effects, but I agree that a warning would be super helpful/important if there are any formulas in the sheet I'm reading in. And regarding slowing down, it doesn't have to be a default feature, but one that the sure might have to actively turn on and then you can also simply say ("might slow down things") like many other packages do.

deschen1 avatar Oct 05 '21 07:10 deschen1

Unfortunately I'm unaware how one can script Excel with R only. I'm not saying it's impossible, but I simply do not know how. With other script languages it seems possible (e.g. powershell).

Having it as an option can hurt, I agree. Maybe I'll add the LibreOffice approach as an option.

JanMarvin avatar Oct 05 '21 17:10 JanMarvin

This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Oct 08 '22 03:10 github-actions[bot]

+1 from me on this issue. This has prevented us from automating any testing around any openxlsx functions that write formulas. We’re attempting to use LibreOffice via CircleCI now to open and save, but unsure how well that will work at scale for us.

jacksonsj avatar Oct 08 '22 05:10 jacksonsj

Like I've said: Evaluation of Excel formulas from R is not entirely possible and comes at very high costs. But if it's important to you, fork the repo and pay a developer or better a small team to write a Excel-formula-to-R conversion function. Glue everything into a package and a couple of R packages will be happy to include it.

JanMarvin avatar Oct 08 '22 06:10 JanMarvin

This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Oct 21 '23 01:10 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

github-actions[bot] avatar Nov 04 '23 01:11 github-actions[bot]