openxlsx icon indicating copy to clipboard operation
openxlsx copied to clipboard

Corrupt file

Open ARParis opened this issue 10 years ago • 36 comments

Thanks for the package.

On my system, I can read a file, write it. But it is corrupted for excel. Even a simple file such as: write.xlsx(iris, file = "writeXLSX1.xlsx", colNames = TRUE, borders = "rows")

I am sending by email the corresponding saved file

ARParis avatar Aug 19 '14 13:08 ARParis

Are you using Excel <= 2003? See #8

lbraglia avatar Aug 19 '14 22:08 lbraglia

R version 3.1.1 Excel 2010 openxlsx version 2.0.15

ARParis avatar Aug 20 '14 07:08 ARParis

Some zip applications are skipping the "Content_type.xml" file. Can you either send me a corrupt xlsx file at [email protected] or jsut unzip the corrupt xlsx file and tell me if the [Content_type].xml is in there (see pic)

image

awalker89 avatar Aug 25 '14 07:08 awalker89

I've had similar problems with overwriting data in an existing workbook. I'm at work and can't share code freely at the moment but something like the code below results in a corrupted file. I'm not sure how much is due to an existing bug or how much is due to an R version mismatch/package update (or all of the above). I have a difficult time having our IT crew keeping R up to date on work servers -- this problem was encountered with R 3.1.3 and openxlsx 2.0.15 (by the way, how many updates to openxlsx have there already been? I feel like I just had this installed!)


data_frame_with_stuff = data.frame(x = 1:10, y = state.name[1:10])

## this worked fine ##

# workbook with some BASIC existing formatting, e.g. blue background
out_wb = loadWorkbook(path_to_wb) 
addDataTable(out_wb, sheet = "Sheet1", x = data_frame_with_stuff, tableStyle = "none")
saveWorkbook(out_wb, new_wb_filename)

## then this fails ##
in_wb = loadWorkbook(new_wb_filename)
addDataTable(in_wb , sheet = "Sheet1", x = data_frame_with_stuff, tableStyle = "none")
saveWorkbook(in_wb , final_filename)


tgwhite avatar Apr 27 '15 20:04 tgwhite

Hi,

Thanks for you email. Can you please update to the most recent version of openxlsx from github with

devtools::install_github("awalker89/openxlsx")

If you are still having issues I'll be happy to help.

Regards, Alex

On 28 April 2015 at 06:28, Taylor White [email protected] wrote:

I've had similar problems with overwriting data in an existing workbook. I'm at work and can't share code freely at the moment but something like the code below results in a corrupted file. I'm not sure how much is due to an existing bug or how much is due to an R version mismatch/package update (or all of the above). I have a difficult time having our IT crew keeping R up to date on work servers -- this problem was encountered with R 3.1.3 and openxlsx 2.0.15 (by the way, how many updates to openxlsx have there already been? I feel like I just had this installed!)

data_frame_with_stuff = data.frame(x = 1:10, y = state.name[1:10])

this worked fine

workbook with some BASIC existing formatting, e.g. blue backgroundout_wb = loadWorkbook(path_to_wb)

addDataTable(out_wb, sheet = "Sheet1", x = data_frame_with_stuff, tableStyle = "none") saveWorkbook(out_wb, new_wb_filename)

then this fails ##in_wb = loadWorkbook(new_wb_filename)

addDataTable(in_wb , sheet = "Sheet1", x = data_frame_with_stuff, tableStyle = "none") saveWorkbook(in_wb , final_filename)

— Reply to this email directly or view it on GitHub https://github.com/awalker89/openxlsx/issues/22#issuecomment-96808228.

awalker89 avatar Apr 28 '15 01:04 awalker89

Hi Alex, I'm having the same issue. Installing the most recent version of openxlsx did not do it.

I am doing something very similar to @tgwhite.

# Load, add data and save worksheet
wb <- loadWorkbook( file =  xlsx_path )
addWorksheet( wb, sheetName = "a_name" )
writeData( wb, sheet = "a_name", x = a_data_frame, withFilter = TRUE, headerStyle = someStyle ) 
saveWorkbook( wb, file =  xlsx_path, overwrite = TRUE )

It exhibits this behavior on some excel workbooks but not others. When it does work -- aka the resulting .xlsx file is not corrupt -- it still warns that:

Warning messages:
1: In rm(dXML) : object 'dXML' not found
2: Overwriting existing cell data. 

Any workaround for this?

Best, V.

vathymut avatar Jul 09 '15 18:07 vathymut

Yeah that's a bug. I've updated the dev version to remove that line. Please update with

devtools::install_github("awalker89/openxlsx")

awalker89 avatar Jul 09 '15 23:07 awalker89

hey, i'm still having the same problem with generating corrupt output file. any news on that front? just as @tgwhite noted, it happens when modifying existing worksheet with some formatting. thanks!

merkliopas avatar Aug 25 '15 08:08 merkliopas

If possible can you please email the xlsx file? The bugs associated with loadWorkbook tend to be due to features openxlsx does not yet support (or only has basic support for). If you can email a file which is corrupted on load I can fix this.

awalker89 avatar Aug 25 '15 08:08 awalker89

thanks! i've emailed you those. let us know when you dissect what was causing the troubles

merkliopas avatar Aug 25 '15 09:08 merkliopas

Thanks Sarunas. This particular bug should be fixed now.

awalker89 avatar Aug 25 '15 10:08 awalker89

it works! great job, thanks for your effort.

merkliopas avatar Aug 25 '15 10:08 merkliopas

Wondering if you got any insight into the corrupt file issue flagged last year by ARParis. When using openxlsx to write files, the [Content_Types].xml file is not included thus Excel sees corrupted file.

I am using the dev version openxlsx 3.0.27, R version 3.2.2 on Windows 7 ($system: "x86_64, mingw32") and Excel 2010.

I have unloaded xlsx package which evidently can cause clash with openxlsx, restarted R, etc.

Thanks, example file .xlsx file attached that shows the problem.

My codeveloper can get openxlsx to behave, when I copy the [Content_Types].xml file from her output .xlsx and combine it with contents of my corrupted file, Excel is happy.

Test_of_TCP_Event_Follow-up_Template_November_19_2015_summaryresults_2015-12-12.xlsx

PS I have confirmed that Rtools\bin and Rtools\gcc-4.6.3 are in the PATH variable, using version Rtools33 for version 3.2.x and later.

klittle314 avatar Dec 13 '15 23:12 klittle314

Never solved this one - if you find anything please let me know.

awalker89 avatar Dec 17 '15 08:12 awalker89

OK, working with Office can be painful.

klittle314 avatar Dec 18 '15 19:12 klittle314

Resolution does not appear to be something internal to openxlsx package. Perhaps the best you can do is flag in documentation? As you indicate above, the problem appears to involve the zip.exe file that is supposed to run as part of RTools but doesn't.

To assure that openxlsx knows where to find the zip executable in Rtools we may have to force the association through a system function call: (1) verify that Sys.getenv("R_ZIPCMD","zip") returns "zip" (2) Then set explicitly the path to the zip.exe file in Rtools, e.g. Sys.setenv(R_ZIPCMD= "C:/Program Files/R/Rtools/bin/zip")

reference: http://stackoverflow.com/questions/27952451/error-zipping-up-workbook

klittle314 avatar Dec 21 '15 21:12 klittle314

Hello everyone, I think I might have found a solution to the original problem (missing Content Types XML in written XLSX). I had the same issue and it may be due to an old version of ZIP!

After learning here that a ZIP application is crucial I went to see whether I have one on my system at all. Turns out, my zip.exe was an antique version 2.3 from 1999 from, surprisingly, a quite recent installation of UnxUtils. (The equivalent of "which zip" in Windows is "where zip", by the way)

I replaced that one with the zip.exe from Rtools (3.0 from 2008) and, bang, no problem at all! Because I had not set the PATH to my Rtools, the old zip was run by openxlsx.

For anyone who hasn't wrestled with pathes and Unix utilities in the past, I assume this issue would be completely unsolvable. Maybe contacting the folks who make UnxUtils, Cygwin and the like would be useful and notify them that their old versions are causing problems with other software.

tempfile avatar Jul 28 '16 08:07 tempfile

If this issue is related to the version of zip.exe, is there a way to identify the version before using it? That way, a warning message could be generated if the version is too old.

debarros avatar Jul 28 '16 17:07 debarros

Hey,

I'm just encountering a similar corruption problem. Just opening an excel file with loadWorkbook and saving it again under a different name corrupts the file in a way that I cannot open it with excel. I checked the issues mentioned above, but I do have the [Content Types].xml and such.

However, I realised that some of the files in the unziped excel file are named differently. In the original file, chartsheets are named from 1 to 31, whereas in the newly created file they run from 32 to 62.

This problem only arises with an old excel file that I have, where the individual sheets have been moved and renamed and such. With a new excel file all files within the excel file are named equally in the original file and the file opened and saved with openxlsx. I can't enclose the file unfortunately, and tried to reproduce the problem with a new file, but the damn thing is working fine unless I use my old file.

Any hints? Many thanks!

hannofalkenberg avatar Jan 11 '17 11:01 hannofalkenberg

@hannofalkenberg it will be a bug in openxlsx. If you can supply an xlsx file which is corrupted after loading with openxlsx I will have a look.

awalker89 avatar Jan 12 '17 07:01 awalker89

Hi, I have also got corrupt excel file after saving. Excel try to restore it and said:

Replaced component: xl/worksheets/sheet1.xml with XML error. The name of the end tag of the element must match the element type in the start tag. Line 1, column 1499692.

igor-misechko avatar Nov 30 '17 14:11 igor-misechko

@igor-misechko I also had that same problem. It solved removing the active filters from the sheet.

AlexUsR avatar Jan 08 '18 15:01 AlexUsR

I have a similar issue. Here is the code: fileTemplate <- 'New01.xlsx' wbTemplate <- loadWorkbook(fileTemplate) addWorksheet(wbTemplate, "Sheet1") writeData(wbTemplate, "Sheet1", dataset) len <- NROW(dataset) dataValidation(wbTemplate, 2, col = 2, rows = 2:len, type = "list", value = "'Data Validation'!$A$2:$A$19") dataValidation(wbTemplate, 2, col = 3, rows = 2:len, type = "list", value = "'Data Validation'!$B$2:$B$501") dataValidation(wbTemplate, 2, col = 5, rows = 2:len, type = "list", value = "'Data Validation'!$C$2:$C$6") openXL(wbTemplate) if I just use one dataValidation it opens okey, if more than one it complains about the file being corrupt...

linako111 avatar Feb 02 '18 03:02 linako111

@igor-misechko I also had that same problem. It solved removing the active filters from the sheet.

@AlexUsR Can you tell in detail how to fix this? I have the same problem as @igor-misechko

aviertio avatar Nov 21 '18 11:11 aviertio

@awalker89 I can also now confirm that corruption doesn't happen if I remove filtering from the file which is loaded. With filtering it saves as corrupt.

aviertio avatar Nov 22 '18 12:11 aviertio

I have the same problem. I updated Rtools, installed most recent version of openxlsx and I don't have a filter set. This is my code:

library(openxlsx) Sys.getenv("R_ZIPCMD", "zip")# This returns zip, I guess that is fine

read the summarydata

CountrySummary <- read.csv('CountrySummary.csv')

Countries <- unique(CountrySummary$Country)

write one xlsx file with all the Country data

wb <- createWorkbook("GC_CountrySummary") for (C in Countries){ df_c <- subset(CountrySummary, CountrySummary$Country==C) df_c <- df_c[order(df_c$Date),]

addWorksheet(wb,df_c, sheetName = C,gridLines = TRUE, tabColour = NULL, zoom = 100, header = NULL, footer = NULL, evenHeader = NULL, evenFooter = NULL, firstHeader = NULL, firstFooter = NULL, visible = TRUE) writeData(wb, sheet=C, df_c, colNames = TRUE) } saveWorkbook(wb, "GC_CountrySummary.xlsx", overwrite = TRUE)

NaRuecker avatar Mar 11 '19 16:03 NaRuecker

I'm having the same issue, on Windows 10 with office 2016 and R 3.6.1, openxlsx 4.1.0.1.

I'm writing to a pre-formatted workbook - saveWorbook() works fine for the first sheet, but when I try to add another sheet to the same workbook (starting with loadWorkbook() again) it either fails (says a workbook of that name doesn't exist even though I can see it is there) or the workbook is corrupted. I can open the corrupted workbook and follow the MS Excel instructions to repair it, but then my whole system slows down and ultimately crashes.

Moreover, the corrupted file is 'locked for editing by another user' so I can't even delete it.

I don't know about the xml thing, but I did see a warning message while running my script saying 'zip is depracated' so it is possible that my version of zip is not compatible with openxlsx as for some other posters above.

The other thing that occurs to me is that there is no function in openxlsx to close the workbook (or unload it, or de-activate it, whichever term is most appropriate). Could this be why my computer thinks the file is still open?

I would try to attach an example but can't because my computer is freezing right now due to this issue. I was wondering if there was something about the way the workbook was formatted (not done by me, it is a template for uploading data to a website that has some conditional formatting and data checks built in) which is somehow incompatible with openxlsx - but then I can see from the above posts that others are having the same issue even with very simple data sets (and presumably blank unformatted workbooks).

Could adding a function such as closeWorkbook() help? I don't really understand how openxlsx works so no idea if this is relevant or not, but just a thought. I really hope others can adopt this package and inject new life into its maintenance because apart from the issue above it is a really great package (and much better than the Java dependent ones).

AmyMikhail avatar Sep 06 '19 18:09 AmyMikhail

Hi @awalker89 ,

I have the same setup as @AmyMikhail and am running into the same problem of corrupting the Excel file when saving. This happens when the original file has filters. This is a sample file that gets corrupted in my PC just by loading and saving it and here is the code that I used:

pathExcel1 <- "C:/Users/Alexandre Velozo/Desktop/test file.xlsx"
pathExcel2 <- "C:/Users/Alexandre Velozo/Desktop/test file 2.xlsx"

wb <- loadWorkbook(pathExcel1)
saveWorkbook(wb,pathExcel2)

If I remove the filters, the file saves correctly.

Thank you in advance for any inputs. Best regards, Alexandre Velozo

acvelozo avatar Sep 09 '19 19:09 acvelozo

I am working in a very similar setup as @AmyMikhail . For some reasons, certain .xlsx files I am receiving from others are getting corrupted when I load them with loadWorkbook() and then save that workbook object (slightly modified) with saveWorkbook().

When attempting to open the newly saved workbook, I would see this error: Replaced Part: /xl/worksheets/sheet1.xml part with XML error. Undeclared prefix. Line 1, column 523098.

Interestingly, I discovered that if I opened the original .xlsx file on my machine, saved it as a new file (w/o changing anything), and repeated the above steps (loading in this new wb and saving it w/o any changes with openxlsx), the newly openxlsx-created file was not corrupted.

Using these two files, wb1 (that would result in a corrupted output), and wb2 (the locally saved version that would not result in a corrupted output), I set out to determine what about them was different, by exploring the workbook objects loaded from loadWorkbook().

Through some trial and error, I discovered that replacing the worksheet from wb1 with the equivalent in wb2 resolved the corruption issue (it was a single worksheet workbook). I drilled down further, and found that the issue was with the dataValidations field of the Worksheet within the workbook object.

wb1 <- loadWorkbook("bad_file.xlsx")
wb2 <- loadWorkbook("good_file.xlsx")
saveWorkbook(wb1, "wb1_out.xlsx") #this is corrupt
saveWorkbook(wb2, "wb2_out.xlsx") #this is OK


wb1$worksheets[[1]] <- wb2$worksheets[[1]]

saveWorkbook(wb1, "wb1_out.xslx") #this is OK

wb1 <- loadWorkbook("bad_file.xlsx") #to overwrite the fixed wb1

wb1$worksheets[[1]]$dataValidations <- wb2$worksheets[[1]]$dataValidations
saveWorkbook(wb1, "wb1_out.xslx") #this is OK also

Here is what the dataValidations fields look like (corrupt followed by OK). In the Excel file, it is a dropdown with two options ("USD million", "LCU million").

<dataValidation type=\"list\" allowBlank=\"1\" showErrorMessage=\"1\" sqref=\"D57:D59 D35:D44 D46:D52 D54:D55 D61:D68\" xr:uid=\"{00000000-0002-0000-0000-000000000000}\">
	<formula1>\"USD million,LCU million\"</formula1>
</dataValidation>

<dataValidation type=\"list\" allowBlank=\"1\" showErrorMessage=\"1\" sqref=\"D57:D59 D35:D44 D46:D52 D54:D55 D61:D68\">
	<formula1>\"USD million,LCU million\"</formula1>
</dataValidation>

I did not diagnose what/where the issue with the dataValidation field is/is coming from, but to solve it I am just overwriting the field from every new Excel file with the field from a working template. Hope it can help some here solve their problems.

t-morrison avatar Sep 10 '19 19:09 t-morrison

I encountered a similar issue: reading an Excel template into R by loadworkbook() and simply save workbook again would result in a corrupted file. The repair result says "...Xml parsing error line 1, column 47451". I don't have any filters in the template, only four data validations with lists; but removing the data validation the problem still exists so I'm not sure it's the problem with the data validation.

I'm not sure how to check the Zip version.I'm using a Mac. I did notice the corrupted file has a smaller size than the originial template. > Sys.getenv("R_ZIPCMD", "zip") [1] "/usr/bin/zip"

I attach the template file and the corrupted output, as well as a screenshot of the Repaired Result. Grateful for any suggestion. R_Template_name.xlsx R_generated_corrupt_file.xlsx

Screen Shot 2020-10-28 at 3 15 05 PM

Below are the commands I used. wb1=loadWorkbook(template_name) saveWorkbook(wb1, output_file1, TRUE)

This is my R session info: `R version 4.0.1 (2020-06-06) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] data.table_1.12.8 zoo_1.8-8 lubridate_1.7.8 reshape2_1.4.4 dplyr_1.0.0 plyr_1.8.6 openxlsx_4.2.3

loaded via a namespace (and not attached): [1] Rcpp_1.0.4.6 rstudioapi_0.11 magrittr_1.5 tidyselect_1.1.0 lattice_0.20-41 R6_2.4.1 rlang_0.4.6 stringr_1.4.0
[9] tools_4.0.1 grid_4.0.1 xfun_0.14 tinytex_0.23 ellipsis_0.3.1 tibble_3.0.1 lifecycle_0.2.0 crayon_1.3.4
[17] zip_2.0.4 purrr_0.3.4 vctrs_0.3.1 glue_1.4.1 stringi_1.4.6 compiler_4.0.1 pillar_1.4.4 generics_0.0.2
[25] pkgconfig_2.0.3 `

jingchenr avatar Oct 28 '20 20:10 jingchenr