Super long zip file fails to extract 1 file from directory

Open 15baraniana opened this issue 9 months ago • 1 comments

We have a 12gb zip file and use the unzipper package to find a specific file in that zip and extract that to the file system. It seems to correctly find all the files (50 files) in the directory but when I filter down to the one I want by path and try to extract that it fails.

Here is my simplified code implementation.

    const directory: CentralDirectory = await Open.file(zipPath);

    console.log('Files in zip:');
    directory.files.forEach((file) => {
      console.log(file.path);
    });

    const targetFileName = 'test.txt';
    const file = directory.files.find((f) => f.path === targetFileName);

    if (!file) {
      console.error(`File ${targetFileName} not found in zip`);
      return;
    }
    console.log('file', file);
    console.log(`Found ${targetFileName}, extracting...`);

    await new Promise((resolve, reject) => {
      file
        .stream()
        .pipe(
          fs.createWriteStream(`/Users/aribaranian/Desktop/${targetFileName}`)
        )
        .on('error', reject)
        .on('finish', resolve);
    });

This code works well for smaller zip files but for this big one, it throws the following error. Including some of the console logs for reference of the file.

file {
  signature: 33639248,
  versionMadeBy: 788,
  versionsNeededToExtract: 45,
  flags: 8,
  compressionMethod: 8,
  lastModifiedTime: 16682,
  lastModifiedDate: 22735,
  crc32: 3622192553,
  compressedSize: 324262805,
  uncompressedSize: 352829440,
  fileNameLength: 36,
  extraFieldLength: 44,
  fileCommentLength: 0,
  diskNumber: 0,
  internalFileAttributes: 0,
  externalFileAttributes: 2176057344,
  offsetToLocalFileHeader: null,
  lastModifiedDateTime: 2024-06-15T08:09:20.000Z,
  pathBuffer: <Buffer 32 32 36 34 30 32 30 2d 31 30 30 2d 35 30 30 2d 4d 41 49 4e 2d 41 52 43 48 2d 47 48 5f 52 32 31 2e 72 76 74>,
  path: '2264020-100-500-MAIN-ARCH-GH_R21.rvt',
  isUnicode: false,
  extra: {
    signature: 1,
    partsize: 8,
    uncompressedSize: 4382034189,
    compressedSize: null,
    offset: null,
    disknum: null
  },
  comment: '',
  type: 'File',
  stream: [Function (anonymous)],
  buffer: [Function (anonymous)]
}
Found 2264020-100-500-MAIN-ARCH-GH_R21.rvt, extracting...
Error parsing zip: TypeError [ERR_INVALID_ARG_TYPE]: The "start" argument must be of type number. Received null

The only unusual thing I see is the offsetToLocalHeader: null but I cant find out why. I have tried increasing the tailSize but that doesn't work. I also tried examining the file in a hex editor to find the EOCD signature and it was ~22 bytes from the end.

Any help would be greatly appreciated.

Feb 26 '25 22:02 15baraniana

I had this happen to me. I ran into a zip file where ~98% of files got extracted, but not 100%. Granted, I'm using Bun and I didn't test with Node.js. Also, unfortunately I can't share the .zip for reproduction efforts.

I've rewritten my code to use unzip from Debian.

Mar 07 '25 09:03 davidgomes