7z-wasm icon indicating copy to clipboard operation
7z-wasm copied to clipboard

Get error when trying to extract large files (e.g. >2gb)

Open adyster opened this issue 3 years ago • 4 comments

I'm having issues with trying to extract large files out of an archive (e.g. around 2gb or so). Gives me a: RangeError: Array buffer allocation failed) at Error

I'm wondering if it's got something to do with it being a 32bit build? Is it possible to compile it as 64-bit?

adyster avatar Oct 12 '22 22:10 adyster

Same here. I have not checked it yet (I hit into it today), but I guess that it is either:

  • incorrect use of the filesystem when reading (i.e., the whole file is read into memory), or
  • the unpacking algorithm that wants to keep the whole extracted file contents in memory before flushing (unlikely, though, as 32-bit 7-Zip extracts large files fine), or
  • something about writing which keeps the whole file in memory before flushing it.

Both emscripten and browsers support 4GB memory now, so that could alleviate the problem at least partially. But some kind of real fix would be necessary.

rhnatiuk avatar Jan 13 '23 09:01 rhnatiuk

It's these two lines. https://github.com/use-strict/7z-wasm/blob/f5fdf5e678d1f4dea66b0dda0ef016f436146112/7zz.es6.js#L2484-L2485 According to MDN https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Errors/Invalid_array_length Uint8Arraymax size is close to (2GiB-1) but try with Chrome I cant get new Uint8Array(2**31-1)to work, so It's smaller.

I'm currently trying to modify emscripten MEMFS to do basic read and write with File System Access API. I'll post a follow up if I make some progress.

k8188219 avatar Jan 13 '23 14:01 k8188219

Are you running this in NodeJS or browser environment? In browsers, all emscripten FS adapters end up preloading the entire file system into memory because only synchronous APIs work with WASM and browser APIs like IndexedDB are async. File System Access API just recently had support added for synchronous access via createSyncAccessHandle, but that can only run in web workers, and obviously emscripten doesn't have an adapter for that yet.

tl;dr version: don't expect to be running this in the browser with large archives any time soon :(

use-strict avatar Jan 13 '23 15:01 use-strict

In browser.

But. Emscripten has a new WasmFS filesystem (still under development but already functional) with the OPFS (File System Access API) back-end. That should allow reading the file in chunks. I guess 7zip is not keeping whole files in memory. Otherwise, it would not be able to pack/unpack large files.

Gosh, if only I knew more about the whole thing. :( We have in our project a C++/wasm worker that reads large files in chunks using the code written in JS to access the browser's file system. We have used File and Directory Entries API (because of persistency), but now migrating to OPFS. So, something like that would perhaps work for 7zip as well, I guess.

rhnatiuk avatar Jan 13 '23 16:01 rhnatiuk