adm-zip icon indicating copy to clipboard operation
adm-zip copied to clipboard

Is there anyway to override the date (Checksum of zip mismatched) ?

Open ClaytonAstrom opened this issue 6 years ago • 9 comments

I'm writing a file to zip on upload:

/**
   * @method compressAndHash
   * @description Returns an object that has the compressed file in a buffer, and the representing md5 checksum
   * @param buffer
   * @param fileName The name of the output file in the zip
   */
  private compressAndHash(buffer: Buffer, fileName: string) {
    const zip = new admZip();
    zip.addFile(fileName, buffer);
    const bufferedZip = zip.toBuffer();
    return {
      compressedFile: bufferedZip,
      hash: this.createHash(bufferedZip)
    };
  }

  /**
   * @method createHash
   * @description Generates md5 for a given buffer
   * @param buffer
   */
  private createHash(buffer: Buffer) {
    const hash = crypt
      .createHash('md5')
      .update(buffer)
      .digest('base64');
    return hash;
  }
}

But it looks like the "date" is part of the checksum. Is there anyway to override the date? Similar to the issue/solution mentioned in https://github.com/archiverjs/node-archiver/issues/82 so we can get consistent checksums?

ClaytonAstrom avatar Jan 04 '19 22:01 ClaytonAstrom

Hi @ClaytonAstrom,

I recently ran into this exact issue. I ended up mocking the date use in date.Now() which is used inside amd-zip. We now get a consistent checksum.

Apparently dates are apart of the zip spec, and adm-zip isn't doing anything out of the ordinary here.

Hopefully this is useful! Thanks, Ollie

describe('/zip/', ()=> {
    let clock;

    // date.now() needs to be mocked, as it's stored in the output zip and causes the zips to differ ever-so-slightly
    beforeEach(async function() {
        const date = new Date(2013, 3, 1);
        clock = sinon.useFakeTimers(date.getTime());
    });

    afterEach(function() {
        clock.restore();
    });

    it('zipped data exactly matches expected output', async ()=> {
        const result = zip.zipFile(xml);

        expect(result).to.equal(encodedZip);
    });
});

pxlprfct avatar Feb 08 '19 12:02 pxlprfct

Thanks @pxlprfct ! This definitely helps with testing, but my only issue is if I create an API to compress uploaded files and store them in a database, I'm trying to make sure that no one is uploading duplicates. And since the API is creating the zip, the hash will always be different. If I don't have the ability to override the date, then there isn't a good way to check for file duplication, right?

ClaytonAstrom avatar Feb 08 '19 15:02 ClaytonAstrom

I believe, that the date is stored in a specific position within the zip. You could check against the first 6(?) characters, skip the two date characters, and then compare against the rest of the string - until you see the same two date characters again. It could be a little messy.

Or, if the date stored in the zip isn't important to you - you could change the library code to use a specific, hardcoded date. https://github.com/cthackers/adm-zip/blob/master/headers/entryHeader.js#L37 might be the place to start.

Neither way is particularly nice - but these are only the solutions I can think of right now.

pxlprfct avatar Feb 08 '19 18:02 pxlprfct

I just discovered this problem. If I use zip on a Mac to compress a directory, I get the same result, which has an identical MD5 checksum each time. If I use AdmZip, I get a different MD5 checksum.

I'm writing a program which creates and uploads Lambda deployment packages to an AWS S3 bucket with versioning. To determine if the deployment package has been modified, S3 provides an ETag, which is just an MD5 hash. You can download the "head" of the file, containing the ETag, compare to a new zip MD5 hash, and if they're the same, no need to re-upload.

This method does not work with AdmZip - because it includes the date as noted above which changes the MD5 hash each time. This is a problem. It may be in the spec to change the date, but at least on a Mac, the standard behavior is to create an identical to the byte zip file with unchanged contents. At a minimum, there should be some flag we can set which makes this consistent output from identical input possible, or even the default.

I'm not going to override the internals - this will just make me do an exec override instead, or use another zip library if it can support this.

Please fix this behavior! I'd bet I'd get a consistent result on WinZip too, if I go check. I think it's what most people expect.

michael-crawford avatar Mar 22 '19 02:03 michael-crawford

@michael-crawford I'm running into a slightly different but related issue related to changing checksums. I don't have meaningful privileges to install zip in my execution environment, so I'm going to take a look at node-zip and see if it performs more consistently.

If anyone else comes across this, I'm trying to get Amazon CodeBuild + CodePipeline to accept an input artifact which was zipped using this library and then placed in an S3 bucket. Right now it's barfing because there's a checksum mismatch between the source zip and the one that eventually gets copied to the build step.

Hope you get a chance to take a look at this one, @cthackers !

john-osullivan avatar Apr 17 '19 17:04 john-osullivan

adm-zip wraps internal structures so you cant see them directly. one idea is after you call:

zip.addfile(filename, ...);

you also call

zip.getEntry(filename).header.time = new Date("some known date");

and after that you call toBuffer()

5saviahv avatar May 02 '19 14:05 5saviahv

Is there any progress with the same problem?

zq0904 avatar Oct 10 '22 08:10 zq0904

+1

krmao avatar Oct 18 '22 04:10 krmao

In order to fix the time for all files in the zip, I used

for (const entry of zip.getEntries()) {
    entry.header.time = new Date("1912-06-23T00:00:00.000Z")
}

mircohacker avatar Apr 18 '23 15:04 mircohacker