libarchivejs
libarchivejs copied to clipboard
Streaming API
I am considering to use this library with big files (read archives >4GB). Is there a possibility to implement streaming the output of a file extraction action without storing it in memory? Otherwise I'll probably end up with multiple GB of RAM usage only to hold the data that the library extracted.
do you have any specific API in mind ? should we just return chucks of typed arrays ?
Chunked type arrays would work perfectly. I don't think that browsers have a standardized streaming interface, so just continously returning the chunks (in order, of course) in a callback is a decent implementation.
I would also be very happy to have this feature enhancement!
I'm just wondering if there are any plans to have it.
unfortunately I do not have this planned yet due to lack of time
Revisiting this I have a question about use-case, if there's a single large file wouldn't it end-up in RAM anyway even if it's streamed as chunked ? unless it's streamed to network right away, it which case it would make more sense to decompress on server
Hi @nika-begiashvili we are interested in a streaming API.
We sometimes need to process 10GB+ files in the browser. We are only interested in a subset of the files in these archives based on a pattern (this subset is about <1% of the overall size). Our use case would be to scan the archive to get a list of file paths, then selectively unarchive files based on a file pattern.
Is this something that is theoretically possible with the way libarchive is designed? We'd be willing to sponsor an improvement.
Yes, I think that should be possible since javascript File object can be read by chunks and libarchive does provide custom read callbacks, although it will need to call javascript functions from C