openxlsx icon indicating copy to clipboard operation
openxlsx copied to clipboard

Bug: `openxlsx::loadWorkbook()` corrupts loaded xlsx file

Open rkrug opened this issue 5 years ago • 16 comments

Expected Behavior

wb <- openxlsx::loadWorkbook("./tmp.xlsx")
openxlsx::openXL(wb)

should open the Excel file

Actual Behavior

wb is corrupted and Excel can only merely fix it with lot's of missing data. LibreOffice and Pages can't fix it.

openxlsx::saveWorkbook(wb, "./tmp_new.xlsx, overwrite = TRUE)

results in attached corrupt excel file tmp_new.xlsx

Steps to Reproduce the Problem

tmp.xlsx

wb <- openxlsx::loadWorkbook("./tmp.xlsx")
openxlsx::openXL(wb)

sessionInfo()

> devtools::session_info()
─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.5.3 (2019-03-11)
 os       macOS High Sierra 10.13.6   
 system   x86_64, darwin17.7.0        
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       Europe/Zurich               
 date     2019-04-09                  

─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package     * version date       lib source                             
 assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.5.3)                     
 backports     1.1.3   2018-12-14 [1] CRAN (R 3.5.1)                     
 callr         3.2.0   2019-03-15 [1] CRAN (R 3.5.3)                     
 cli           1.1.0   2019-03-19 [1] CRAN (R 3.5.3)                     
 crayon        1.3.4   2017-09-16 [1] CRAN (R 3.5.1)                     
 desc          1.2.0   2018-05-01 [1] CRAN (R 3.5.1)                     
 devtools      2.0.1   2018-10-26 [1] CRAN (R 3.5.1)                     
 digest        0.6.18  2018-10-10 [1] CRAN (R 3.5.1)                     
 fs            1.2.6   2018-08-23 [1] CRAN (R 3.5.1)                     
 glue          1.3.1   2019-03-12 [1] CRAN (R 3.5.3)                     
 magrittr      1.5     2014-11-22 [1] CRAN (R 3.5.1)                     
 memoise       1.1.0   2017-04-21 [1] CRAN (R 3.5.1)                     
 openxlsx      4.1.1   2019-04-09 [1] Github (awalker89/openxlsx@ead0038)
 packrat       0.5.0   2018-11-14 [1] CRAN (R 3.5.2)                     
 pkgbuild      1.0.2   2018-10-16 [1] CRAN (R 3.5.1)                     
 pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.5.1)                     
 prettyunits   1.0.2   2015-07-13 [1] CRAN (R 3.5.1)                     
 processx      3.3.0   2019-03-10 [1] CRAN (R 3.5.2)                     
 ps            1.3.0   2018-12-21 [1] CRAN (R 3.5.2)                     
 R6            2.4.0   2019-02-14 [1] CRAN (R 3.5.2)                     
 Rcpp          1.0.1   2019-03-17 [1] CRAN (R 3.5.3)                     
 remotes       2.0.2   2018-10-30 [1] CRAN (R 3.5.1)                     
 rlang         0.3.2   2019-03-21 [1] CRAN (R 3.5.3)                     
 rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.5.1)                     
 rstudioapi    0.9.0   2019-01-09 [1] CRAN (R 3.5.2)                     
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.5.1)                     
 testthat      2.0.1   2018-10-13 [1] CRAN (R 3.5.1)                     
 usethis       1.4.0   2018-08-14 [1] CRAN (R 3.5.1)                     
 withr         2.1.2   2018-03-15 [1] CRAN (R 3.5.1)                     
 zip           2.0.1   2019-03-11 [1] CRAN (R 3.5.3)                     

[1] /usr/local/lib/R/3.5/site-library
[2] /usr/local/Cellar/r/3.5.3/lib/R/library
> 

rkrug avatar Apr 09 '19 12:04 rkrug

I believe that I am having the same or a similar problem. I am not able to export the workbook without getting Excel errors: "We found a problem with some content in 'filename.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes." When I click Yes: "Excel cannot open the file 'filename.xlsx' because the file format or the file extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file."

My work-around is to export the file without a file extension, then add the file extension. Excel is then able to open the file with no problem.

Thank you!

SophiaLC avatar May 08 '19 18:05 SophiaLC

Hi Sophia,

thanks for the tip, but I tried that, and still the same error.

rkrug avatar May 09 '19 07:05 rkrug

Just made comment on https://github.com/awalker89/openxlsx/issues/348#issuecomment-492382870 but probably should have placed my comment here as the issue is more recent. Sorry for duplication.

I can report similar. When opening a workbook with simple text entries, adding data to it, and previewing, I get the "We found a problem..." dialogue. Click yes to repair and get: Removed Part: /xl/styles.xml part with XML error. (Styles) Undeclared prefix. Line 1, column 1541. Repaired Records: Cell information from /xl/worksheets/sheet1.xml part Used Range is A1:U14

dat <- data.frame(FldID = c(190022L, 190002L, 190011L, 190050L))
  rm(wb)
  xlF <- file.path(path$wd_docs,
                   paste0("TTEB_.xlsx"))
  wb <- openxlsx::loadWorkbook(xlF)
  
  openxlsx::writeData(wb, sheet = names(wb)[1], startRow = 10, startCol = 1, x = dat)
  openxlsx::openXL(wb)
- Session info ------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.5.3 (2019-03-11)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_United States.1252  
 ctype    English_United States.1252  
 tz       America/New_York            
 date     2019-05-14                  

- Packages ----------------------------------------------------------------------------------------
 package     * version    date       lib source                  
 assertthat    0.2.1      2019-03-21 [1] CRAN (R 3.5.3)          
 base64enc     0.1-3      2015-07-28 [1] CRAN (R 3.5.0)          
 bit           1.1-14     2018-05-29 [1] CRAN (R 3.5.0)          
 bit64         0.9-7      2017-05-08 [1] CRAN (R 3.5.0)          
 cli           1.1.0      2019-03-19 [1] CRAN (R 3.5.3)          
 crayon        1.3.4      2017-09-16 [1] CRAN (R 3.5.1)          
 data.table    1.12.2     2019-04-07 [1] CRAN (R 3.5.3)          
 DBI           1.0.0      2018-05-02 [1] CRAN (R 3.5.1)          
 dbplyr        1.3.0      2019-01-09 [1] CRAN (R 3.5.3)          
 digest        0.6.18     2018-10-10 [1] CRAN (R 3.5.3)          
 dplyr         0.8.0.1    2019-02-15 [1] CRAN (R 3.5.2)          
 evaluate      0.13       2019-02-12 [1] CRAN (R 3.5.3)          
 fansi         0.4.0      2018-10-05 [1] CRAN (R 3.5.2)          
 flextable     0.5.2      2019-04-02 [1] CRAN (R 3.5.3)          
 forcats       0.4.0      2019-02-17 [1] CRAN (R 3.5.3)          
 gdtools       0.1.8      2019-04-02 [1] CRAN (R 3.5.3)          
 gfuns       * 0.0.0.9000 2019-03-12 [1] local (HJAllen/gfuns@NA)
 glue          1.3.1      2019-03-12 [1] CRAN (R 3.5.3)          
 hms           0.4.2      2018-03-10 [1] CRAN (R 3.5.1)          
 htmltools     0.3.6      2017-04-28 [1] CRAN (R 3.5.1)          
 knitr         1.22       2019-03-08 [1] CRAN (R 3.5.3)          
 LHfuns      * 0.0.0.9000 2018-12-18 [1] local (@0.0.0.9)        
 lubridate     1.7.4      2018-04-11 [1] CRAN (R 3.5.1)          
 magrittr    * 1.5        2014-11-22 [1] CRAN (R 3.5.1)          
 officer       0.3.3      2019-03-01 [1] CRAN (R 3.5.2)          
 openxlsx    * 4.1.0      2018-05-26 [1] CRAN (R 3.5.3)          
 pander        0.6.3      2018-11-06 [1] CRAN (R 3.5.1)          
 pillar        1.3.1      2018-12-15 [1] CRAN (R 3.5.2)          
 pkgconfig     2.0.2      2018-08-16 [1] CRAN (R 3.5.1)          
 purrr         0.3.2      2019-03-15 [1] CRAN (R 3.5.3)          
 R6            2.4.0      2019-02-14 [1] CRAN (R 3.5.3)          
 Rcpp          1.0.1      2019-03-17 [1] CRAN (R 3.5.3)          
 readr         1.3.1      2018-12-21 [1] CRAN (R 3.5.3)          
 rlang         0.3.4      2019-04-07 [1] CRAN (R 3.5.3)          
 RMariaDB      1.0.6      2018-05-06 [1] CRAN (R 3.5.1)          
 rmarkdown     1.12       2019-03-14 [1] CRAN (R 3.5.3)          
 rstudioapi    0.10       2019-03-19 [1] CRAN (R 3.5.3)          
 sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.5.3)          
 stringi       1.4.3      2019-03-12 [1] CRAN (R 3.5.3)          
 stringr       1.4.0      2019-02-10 [1] CRAN (R 3.5.3)          
 tibble        2.1.1      2019-03-16 [1] CRAN (R 3.5.3)          
 tidyr         0.8.3      2019-03-01 [1] CRAN (R 3.5.3)          
 tidyselect    0.2.5      2018-10-11 [1] CRAN (R 3.5.1)          
 utf8          1.1.4      2018-05-24 [1] CRAN (R 3.5.1)          
 uuid          0.1-2      2015-07-28 [1] CRAN (R 3.5.0)          
 withr         2.1.2      2018-03-15 [1] CRAN (R 3.5.1)          
 xfun          0.6        2019-04-02 [1] CRAN (R 3.5.3)          
 xml2          1.2.0      2018-01-24 [1] CRAN (R 3.5.1)          
 zip           2.0.1      2019-03-11 [1] CRAN (R 3.5.3)  

HJAllen avatar May 14 '19 19:05 HJAllen

OK - I had the error on two Macs, but did not have it in Linux.

Any help appreciated, what is going on - this affects a nearly finished package!

rkrug avatar May 15 '19 07:05 rkrug

Comparing the styles.xml between the original .xlsx and the openxlsx modified version, there are at least some differences in stylesheets. It appears, at least in my instance, stylesheet references are being dropped.

Original File

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:x16r2="http://schemas.microsoft.com/office/spreadsheetml/2015/02/main" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" mc:Ignorable="x14ac x16r2 xr">
  <numFmts count="1">
    <numFmt numFmtId="164" formatCode="mm/dd/yy;@"/>
  </numFmts>

Openxlsx modified file

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" mc:Ignorable="x14ac">
  <numFmts count="1">
    <numFmt numFmtId="164" formatCode="mm/dd/yy;@"/>
  </numFmts>

See https://github.com/awalker89/openxlsx/issues/464#issuecomment-492719903 It appears the following tablestyle is added by openxlsx and is the possible offender

    <tableStyle name="MySqlDefault" pivot="0" table="0" count="2" xr9:uid="{E4D6D12A-12B2-45C8-91AA-1B61EA448CBE}">
      <tableStyleElement type="wholeTable" dxfId="1"/>
      <tableStyleElement type="headerRow" dxfId="0"/>
    </tableStyle>
  </tableStyles>

HJAllen avatar May 15 '19 15:05 HJAllen

when I remove

<tableStyles count="1" defaultTableStyle="TableStyleMedium9" defaultPivotStyle="PivotStyleLight16"/>
    <tableStyle name="MySqlDefault" pivot="0" table="0" count="2" xr9:uid="{8049963C-4A48-4192-A808-6AD7E4D6BD5E}">
      <tableStyleElement type="wholeTable" dxfId="1"/>
      <tableStyleElement type="headerRow" dxfId="0"/>
    </tableStyle>
  </tableStyles>

and replace with

<tableStyles count="1" defaultTableStyle="TableStyleMedium9" defaultPivotStyle="PivotStyleLight16"/>

zip it up and rename .xlsx, it opens no problem.

HJAllen avatar May 15 '19 15:05 HJAllen

Thanks for looking into that. The strange thing is, that it worked under Linux.

I can try out a fix as soon as you have one available on github.

rkrug avatar May 15 '19 15:05 rkrug

It appears openxlsx is not adding the offending tablestyle. I have MySQL for excel installed and am suspicious it is adding the style. Using repaired file as above:

  1. open in excel
  2. modify a cell
  3. save The tablestyle reappears with subsequent corruption. Solution for me: Don't save in excel again.

HJAllen avatar May 15 '19 16:05 HJAllen

The problem are validations (at least in the new version of Excel - 16.25). As soon as I enter one validation rule, it does not work anymore.

rkrug avatar May 16 '19 08:05 rkrug

The problem are validations (at least in the new version of Excel - 16.25). As soon as I enter one validation rule, it does not work anymore.

I have the same error with validations in windows Excel 365, this error only pops out after I upgraded to office 365.

Jannickz avatar Aug 05 '19 09:08 Jannickz

I'm having the same problem, also in workbooks with validation rules :(

franciscoyira avatar Jan 28 '20 23:01 franciscoyira

Same here , and because I didn't initiate the excel file, I used the link below to find and delete data validation rules, after that everythings worked fine

https://www.ablebits.com/office-addins-blog/2017/08/16/data-validation-excel/

sambafall avatar Feb 02 '21 07:02 sambafall

Hey all, a little late here, but I was experiencing the same problem until I removed special characters from my tab names.

Coledavis00 avatar Apr 30 '21 21:04 Coledavis00

Hey all, a little late here, but I was experiencing the same problem until I removed special characters from my tab names.

This worked for me too! Just renamed the sheet that had an "&" symbol in it and everything worked perfectly! Thanks

tomtom776 avatar May 06 '21 04:05 tomtom776

Found another thing that triggers this error. Text boxes are fine, but text boxes within a chart object are not.

melville1808 avatar May 20 '21 04:05 melville1808

https://github.com/awalker89/openxlsx/issues/464#issuecomment-492719903 pointed me in the right direction in my case. I had a custom table style save to my template workbook. Deleting the custom table style in the template fixed this for me.

crisrengifo avatar Jun 16 '21 11:06 crisrengifo