adm-zip
adm-zip copied to clipboard
Is there anyway to override the date (Checksum of zip mismatched) ?
I'm writing a file to zip on upload:
/**
* @method compressAndHash
* @description Returns an object that has the compressed file in a buffer, and the representing md5 checksum
* @param buffer
* @param fileName The name of the output file in the zip
*/
private compressAndHash(buffer: Buffer, fileName: string) {
const zip = new admZip();
zip.addFile(fileName, buffer);
const bufferedZip = zip.toBuffer();
return {
compressedFile: bufferedZip,
hash: this.createHash(bufferedZip)
};
}
/**
* @method createHash
* @description Generates md5 for a given buffer
* @param buffer
*/
private createHash(buffer: Buffer) {
const hash = crypt
.createHash('md5')
.update(buffer)
.digest('base64');
return hash;
}
}
But it looks like the "date" is part of the checksum. Is there anyway to override the date? Similar to the issue/solution mentioned in https://github.com/archiverjs/node-archiver/issues/82 so we can get consistent checksums?
Hi @ClaytonAstrom,
I recently ran into this exact issue. I ended up mocking the date use in date.Now() which is used inside amd-zip. We now get a consistent checksum.
Apparently dates are apart of the zip spec, and adm-zip isn't doing anything out of the ordinary here.
Hopefully this is useful! Thanks, Ollie
describe('/zip/', ()=> {
let clock;
// date.now() needs to be mocked, as it's stored in the output zip and causes the zips to differ ever-so-slightly
beforeEach(async function() {
const date = new Date(2013, 3, 1);
clock = sinon.useFakeTimers(date.getTime());
});
afterEach(function() {
clock.restore();
});
it('zipped data exactly matches expected output', async ()=> {
const result = zip.zipFile(xml);
expect(result).to.equal(encodedZip);
});
});
Thanks @pxlprfct ! This definitely helps with testing, but my only issue is if I create an API to compress uploaded files and store them in a database, I'm trying to make sure that no one is uploading duplicates. And since the API is creating the zip, the hash will always be different. If I don't have the ability to override the date, then there isn't a good way to check for file duplication, right?
I believe, that the date is stored in a specific position within the zip. You could check against the first 6(?) characters, skip the two date characters, and then compare against the rest of the string - until you see the same two date characters again. It could be a little messy.
Or, if the date stored in the zip isn't important to you - you could change the library code to use a specific, hardcoded date. https://github.com/cthackers/adm-zip/blob/master/headers/entryHeader.js#L37 might be the place to start.
Neither way is particularly nice - but these are only the solutions I can think of right now.
I just discovered this problem. If I use zip on a Mac to compress a directory, I get the same result, which has an identical MD5 checksum each time. If I use AdmZip, I get a different MD5 checksum.
I'm writing a program which creates and uploads Lambda deployment packages to an AWS S3 bucket with versioning. To determine if the deployment package has been modified, S3 provides an ETag, which is just an MD5 hash. You can download the "head" of the file, containing the ETag, compare to a new zip MD5 hash, and if they're the same, no need to re-upload.
This method does not work with AdmZip - because it includes the date as noted above which changes the MD5 hash each time. This is a problem. It may be in the spec to change the date, but at least on a Mac, the standard behavior is to create an identical to the byte zip file with unchanged contents. At a minimum, there should be some flag we can set which makes this consistent output from identical input possible, or even the default.
I'm not going to override the internals - this will just make me do an exec override instead, or use another zip library if it can support this.
Please fix this behavior! I'd bet I'd get a consistent result on WinZip too, if I go check. I think it's what most people expect.
@michael-crawford I'm running into a slightly different but related issue related to changing checksums. I don't have meaningful privileges to install zip in my execution environment, so I'm going to take a look at node-zip and see if it performs more consistently.
If anyone else comes across this, I'm trying to get Amazon CodeBuild + CodePipeline to accept an input artifact which was zipped using this library and then placed in an S3 bucket. Right now it's barfing because there's a checksum mismatch between the source zip and the one that eventually gets copied to the build step.
Hope you get a chance to take a look at this one, @cthackers !
adm-zip wraps internal structures so you cant see them directly. one idea is after you call:
zip.addfile(filename, ...);
you also call
zip.getEntry(filename).header.time = new Date("some known date");
and after that you call toBuffer()
Is there any progress with the same problem?
+1
In order to fix the time for all files in the zip, I used
for (const entry of zip.getEntries()) {
entry.header.time = new Date("1912-06-23T00:00:00.000Z")
}