openxlsx icon indicating copy to clipboard operation
openxlsx copied to clipboard

Error in writeData for large objects - Microsoft Excel file corrupted

Open AmyMikhail opened this issue 8 years ago • 4 comments

Expected Behavior

  • Pre-formatted Microsoft Excel workbook template copied and loaded with loadWorkbook
  • writeData used to save a data.table to a specific sheet within the workbook
  • saveWorkbook used to save the workbook
  • Small amounts of of data (e.g. ~ 1000 rows of 38 columns) write to the template without problems and data is visible when the workbook is opened

Actual Behavior

  • Large data (e.g. ~20000 rows of 38 columns) write to the template without any error messages but when the Microsoft Excel workbook is opened, it gives the following error:

' Excel found unreadable content in 'My workbook.xlsx'. Do you want to recover the content? Yes / No'

Clicking 'Yes' results in all the data being removed (the workbook is blank) and the following log file is generated:

-<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"><logFileName>error012360_01.xml</logFileName>

Errors were detected in file 'C:\Test_folder\My workbook.xlsx'-<removedParts summary="Following is a list of removed parts:"><removedPart>Removed Part: /xl/sharedStrings.xml part with XML error. (Strings) Illegal xml character. Line 1, column 2022603.</removedPart></removedParts>-<removedRecords summary="Following is a list of removed records:"><removedRecord>Removed Records: Cell information from /xl/worksheets/sheet1.xml part</removedRecord><removedRecord>Removed Records: Cell information from /xl/worksheets/sheet2.xml part</removedRecord><removedRecord>Removed Records: Cell information from /xl/worksheets/sheet3.xml part</removedRecord><removedRecord>Removed Records: Cell information from /xl/worksheets/sheet4.xml part</removedRecord></removedRecords></recoveryLog>

Note: a smaller subset of the data (first 1000 rows for each of four data.tables) copied to the workbook and I was able to open the workbook and view the data without problems. It seems therefore that this issue is due to the large size of the objects being copied that exceeds the Microsoft Excel workbook limit. If the same operation is attempted with a java-based r package like xlsx or excel.link, a memory limit error is generated in R and the write/save operation fails. write.csv works irrespective of file size.

Steps to Reproduce the Problem

(please attach an example xlsx file if possible)

  1. Create a data.table with size approaching the MS Excel file size limit (~ 20,000 rows)

  2. loadWorkbook() # to load pre-formatted template

  3. writeData() # to write large data.table to a sheet within the workbook

  4. saveWorkbook() # to save the workbook with added data.

  5. Open the Microsoft Excel file; if corrupted the error message above will appear.

sessionInfo()

  • Version of openxlsx: openxlsx_4.0.17
  • Version of R: 3.4.0

AmyMikhail avatar Jul 21 '17 15:07 AmyMikhail

Although the above Microsoft Excel error log mentions 'an illegal XML character' the data.table is comprised of alpha numeric data with spaces only (a cleaning step removed all other non-alphanumeric characters).

AmyMikhail avatar Jul 21 '17 15:07 AmyMikhail

I have same issue when trying to export a data frame containing only text (character) columns.

I currently do have some non-alphanumeric characters, though I am able to export the same data frame using package "writexl", function write_xlsx(). However this package only seems to support single-sheet exporting which is not ideal.

jmccloskey108 avatar Oct 24 '17 20:10 jmccloskey108

I get a similar issue when trying to write to 7 pre-formatted (using loadWorkbook)sheets. My data is 15 columns by 1,000 rows. It takes ~20 seconds to saveWorkbook. Then when opening the xlsx file i get the "We found a problem with some content..."

Would love a solution!

dbaellow-rwa avatar Feb 26 '19 22:02 dbaellow-rwa

I just solved this issue for my case, but haven't spent the time to make it reproducible, etc. My intuition is that the package doesn't like writing over pre-formatted sheets. So I just created "source" sheets with no format, hid them, and then linked the pre-formatted sheets to the source sheets in excel.

So in short, I'm loading a workbook in R, writing to unformatted, hidden sheets, and the pre-formatted unhidden sheets are linked (e.g. "=SourceSheet!A1") to the hidden ones. That fixed it for me!

drehow avatar Jun 24 '19 15:06 drehow