Memory Leak: `ExcelRangeBase.Copy` without `ExcelRangeCopyOptionFlags.ExcludeStyles` keeps the whole source `ExcelPackage` in memory indefinitely
EPPlus usage
Noncommercial use
Environment
Windows
Epplus version
8.2.1
Spreadsheet application
Excel
Description
I have a collection of about 100 "source" excel files and I want to create a new "destination" excel file, extracting some data from the source files. This is how I am doing it:
- Create a single destination
ExcelPackage(using the parameterless constructor) - Then for each source file, I do the following (one file at a time):
- Load the source file using
File.ReadAllBytes - Create a
MemoryStreamfrom the byte array - Create an
ExcelPackagefrom the stream - Copy a specific range from a specific worksheet using
ExcelRangeBase.Copy, essentially appending that range to the end of the destination file - Dispose of the
ExcelPackageand theMemoryStream
- Load the source file using
- Note that I don't dispose of the destination
ExcelPackageuntil I have run all source files.
This is simple and straightforward to do using EPPlus. However, when I run it, the program keeps increasing memory usage, until it consumes all available RAM. The memory profiler when debugging within Visual Studio shows that the memory is increasing whenever a file is loaded, but it never decreases, not even when a file should have been unloaded. FYI the source files are 0.3-2 MB each in size and EPPlus seems to use a few 100s MB RAM to load each one of them.
The most interesting part is when I enabled ExcelRangeCopyOptionFlags.ExcludeStyles, the memory leak was fixed, without making any other changes. I have the impression named styles may be involved, but I am not sure about that. Unfortunately, I cannot share the source excel files and I am not an excel expert in order to understand if it is something specific to them.
Another workaround I found is saving the destination file (ie to a byte array) after copying a source file to it and then loading it again. This mitigates the memory leak and allows me to preserve styles, but feels suboptimal.
All in all, it looks like when I am copying styles, the destination ExcelPackage somehow keeps a reference to the source ExcelPackage.
PS: I forgot to mention I also tried (for each source file):
- Copy from the source
ExcelPackageinto a tempExcelPackage - Dispose of source
- Try to copy from temp into destination
- Dispose of temp
In this case, EPPlus crashes when trying to copy from temp into destination. I think that's where (stacktraces) I might have seen something about named styles that I mentioned earlier and this is what gave me the idea to try serializing/deserializing instead.
During my tests I see that memory usage climbs to 4GB for the first 30 seconds and then are stable for the rest of the run at around 3GB. Setting the ExcludeStyles flag seems to have no effect in reducing memory usage in our tests.
My test consisted of about 115 workbooks with sizes from 6KB to 200MB. Workbooks are compressed using zip and when these are loaded into memory they will take significantly more space and EPPlus stores the whole workbook in memory.
Without your workbooks or code it's hard to figure out what is going on exactly. If you could share them it would be a great help in figuring out what is going on.
Thanks for trying to reproduce it @AdrianEPPlus - I assume you didn't run it as a 32-bit application (as this would limit memory to 4GB). But the thing is, I just realized I have explicitly enabled x64 architecture in this project, but I don't remember why (this was before encountering this issue). You could try this option too just to make sure.
Unfortunately I cannot share the original files, but if you still can't reproduce it on your end, I will try to strip the sensitive information from one of them and hopefully the bug will still reproduce. This will take more effort than simply reporting the issue and I have already found the workaround for it, but I will give it a try when I have some more free time. I'll keep you posted.
My initial tests was made inside our EPPlus development project and it should use x64. But to test I created a new project and set it to use x64. Besides a faster running program I get a peak usage of memory at 3,2GB this time.
I think I would need your code and workbooks to continue looking into this.