jszip icon indicating copy to clipboard operation
jszip copied to clipboard

Compress and store in the jsZip Object while adding the file itself

Open anandncode opened this issue 3 years ago • 6 comments

Description

Currently the files that we add to zip are stored AS-IS (without compression/deflate) until generateAsync OR generateNodeStream is called (even if we pass compress options while calling zip.file method). As a result for memory intensive operations, the real gain of compression is not acheived until the above methods are called

For example: The requirement is to download around 4 GB of data that when compressed would be around 500MB. The data is downloaded as CSV files each with certain max number of rows and then added to zip files one by one.

If the requests results in 20CSV files, each of size 200MB, then each time a file is added, the overall memory would be

  1. 200 MB for the current CSV
  2. Accumulated memory of uncompressed zip file (previous CSV files size + current CSV)
  3. By the end, after adding all 20 files, the zip object would be of size 4GB (before calling generateAsync) instead of 500MB exhausting all the available memory.

Instead, if there is any option to compress/deflate and store in the zip object while calling zip.file itself, the overall memory consumption would be far less. At any point of time, the process memory consumption won't exceed Current csv size of 200MB + max compressed zip size which would be always < 500MB

Please let me know if it is possible currently Or there are any alternatives with jsZip OR if my understanding is incorrect

If it's not possible currently, it would be great if you can consider this as feature. As most of the zip requirements are memory intensive, this would be really beneficial.

anandncode avatar Jan 11 '22 07:01 anandncode

This library was not designed for that. However zip.js or fflate should fulfill your requirements.

gildas-lormeau avatar Jan 22 '22 18:01 gildas-lormeau

Thanks a lot @gildas-lormeau for your inputs. I started exploring fflate.

Just want to mention that I was able to get it working with jszip also with the below approach

  1. Add file (size 200 MB CSV string) to jszip object
  2. Use generateAsync and get the zipped raw content
  3. Discard jszip object created in 1
  4. Load the zip content from 2 and create a new jszip object
  5. Repeat this until all files are processed (around 20 files)

As jszip doesn't expand already compressed object unless requested, the size of jszip object after 4 is quite small and I got rid of 200MB overhead of file.

Though its a twisted approach, it works quite nicely. Of course it depends vastly on the criteria that "jszip doesn't expand already compressed object unless requested"

I also started exploring fflate after you suggestion. It works like you mentioned and is quite good. Most probably I will finalize on fflate.

But this is a good feature that jszip can consider adding in to their library. While adding a file, based on the compression options (and may be another flag that says "compress right away and discard the original"), they can compress sync/async and store.

As almost all zip based features are memory intensive, this will be quite a good feature in my opinion

anandncode avatar Jan 24 '22 05:01 anandncode

@anandncode I'm glad I was able to help you. Actually I'm the author of zip.js ;) and I created it 10 years ago for the reasons for mentioned in this issue among other things. I have a doubt that it will evolve in this direction in jszip.

gildas-lormeau avatar Jan 24 '22 12:01 gildas-lormeau

@gildas-lormeau I realized that you are the author of zip.js after posting my comment :) and I was glad that you pointed me to other library as well without any bias.

wow, if in 10 years they didn't address it, I don't think they will do it now.

Btw, I will also try zip.js; it has quite generic interface with ability for different readers and writers including fflate.

anandncode avatar Jan 25 '22 11:01 anandncode