node-archiver icon indicating copy to clipboard operation
node-archiver copied to clipboard

Deterministic archive?

Open janpio opened this issue 6 years ago • 7 comments

Is there a way to make archiver create identical archives each time it is run with the same files and options?

Currently the resulting archive file size is identical, but the file itself is structured in a slightly different way so that calculating the checksum of several archives give different results :/

janpio avatar Jul 15 '19 17:07 janpio

What type of archive you use?

  • GZIP has platform-dependent header
  • ZIP has date information encoded

For GZIP we have code, this is base64 encoded binary content of GZIP archive:

const gzipHeader = {
    darwin: 'H4sIAAAAAAAAE2',
    win32: 'H4sIAAAAAAAACm',
    linux: 'H4sIAAAAAAAAA2',
}[os.platform()];

For ZIP we specify date:

zip.append(chain, {
    name: `customers.csv`,
    date: new Date('2000-07-18T20:18:24.441Z'),
}).finalize();

Note: file name that is appended to archive also encoded, so it should be preserved in order to get exactly same file content

avoinkov avatar Jul 29 '19 13:07 avoinkov

For additional information about GZIP header see https://www.forensicswiki.org/wiki/Gzip

avoinkov avatar Jul 29 '19 13:07 avoinkov

I was indeed using ZIP, so I will try gzip and see if this already fixes my problem. That would be super awesome. Will report back.

janpio avatar Jul 30 '19 09:07 janpio

~~For zip even with specifying the same date, the hash of the zip file differs (even though the contents are identical).~~ ~~Does anyone have a way of achieving deterministic zip archives?~~

Edit. I retract. With specifying the date it does seem to work!

andreieftimie avatar Feb 16 '21 14:02 andreieftimie