jszip
jszip copied to clipboard
Compress and store in the jsZip Object while adding the file itself
Description
Currently the files that we add to zip are stored AS-IS (without compression/deflate) until generateAsync OR generateNodeStream is called (even if we pass compress options while calling zip.file method). As a result for memory intensive operations, the real gain of compression is not acheived until the above methods are called
For example: The requirement is to download around 4 GB of data that when compressed would be around 500MB. The data is downloaded as CSV files each with certain max number of rows and then added to zip files one by one.
If the requests results in 20CSV files, each of size 200MB, then each time a file is added, the overall memory would be
- 200 MB for the current CSV
- Accumulated memory of uncompressed zip file (previous CSV files size + current CSV)
- By the end, after adding all 20 files, the zip object would be of size 4GB (before calling
generateAsync) instead of 500MB exhausting all the available memory.
Instead, if there is any option to compress/deflate and store in the zip object while calling zip.file itself, the overall memory consumption would be far less. At any point of time, the process memory consumption won't exceed Current csv size of 200MB + max compressed zip size which would be always < 500MB
Please let me know if it is possible currently Or there are any alternatives with jsZip OR if my understanding is incorrect
If it's not possible currently, it would be great if you can consider this as feature. As most of the zip requirements are memory intensive, this would be really beneficial.
Thanks a lot @gildas-lormeau for your inputs. I started exploring fflate.
Just want to mention that I was able to get it working with jszip also with the below approach
- Add file (size 200 MB CSV string) to jszip object
- Use generateAsync and get the zipped raw content
- Discard jszip object created in 1
- Load the zip content from 2 and create a new jszip object
- Repeat this until all files are processed (around 20 files)
As jszip doesn't expand already compressed object unless requested, the size of jszip object after 4 is quite small and I got rid of 200MB overhead of file.
Though its a twisted approach, it works quite nicely. Of course it depends vastly on the criteria that "jszip doesn't expand already compressed object unless requested"
I also started exploring fflate after you suggestion. It works like you mentioned and is quite good. Most probably I will finalize on fflate.
But this is a good feature that jszip can consider adding in to their library. While adding a file, based on the compression options (and may be another flag that says "compress right away and discard the original"), they can compress sync/async and store.
As almost all zip based features are memory intensive, this will be quite a good feature in my opinion
@anandncode I'm glad I was able to help you. Actually I'm the author of zip.js ;) and I created it 10 years ago for the reasons for mentioned in this issue among other things. I have a doubt that it will evolve in this direction in jszip.
@gildas-lormeau I realized that you are the author of zip.js after posting my comment :) and I was glad that you pointed me to other library as well without any bias.
wow, if in 10 years they didn't address it, I don't think they will do it now.
Btw, I will also try zip.js; it has quite generic interface with ability for different readers and writers including fflate.