berry icon indicating copy to clipboard operation
berry copied to clipboard

`yarn pack` produces slightly different `.tgz` files on different operating systems

Open borekb opened this issue 4 years ago • 7 comments

Creating an issue from this Discord chat.

Describe the bug

Running yarn pack produces slightly different archive (.tgz file) depending on the OS. For example, this is macOS vs. Windows:

Screen Shot 2021-04-19 at 14 50 25

It comes down to the OS header, as described in RFC1952.

To Reproduce

  1. Run yarn pack on e.g. macOS.
  2. Run yarn pack on e.g. Windows.
  3. Compare the files.

Environment if relevant (please complete the following information):

  • OS: combination of OS's, tested on macOS and Windows.
  • Node version: 12 & 14
  • Yarn version: 2.4.1

Additional context

Discord chat.

Search terms: operating system, OS, Windows, Linux, macOS, Mac, yarn pack, tarball, archive, TGZ, GZIP, ZIP.

borekb avatar Apr 19 '21 12:04 borekb

It turns out Node encodes the operating system when compressing things with the native zlib, so different systems will have different checksums. Fortunately it hasn't been a problem with the cache because we now use zip w/ a wasm version of the zlib (thus guaranteeing determinism), but yarn pack uses node-tar, which uses the native primitive for compression.

Unfortunately Node doesn't expose options to control this behaviour and, even if they did, it probably wouldn't be enough: parallelism may cause other deltas during compression, and there would be no way to control that either. The right fix would probably be to use the wasm zlib for compression (thankfully performances for yarn pack aren't a huge concern since it's only uncommonly used), but it's a bit of work 🤔

arcanis avatar Apr 19 '21 13:04 arcanis

Does npm pack suffer from the same issue or do they avoid it somehow? (I don't have a good way to test it right now, sorry.)

borekb avatar Apr 19 '21 13:04 borekb

Me neither 😄 I suspect they do; I remember someone mentioning other inconsistent checksum issues on pack when adding npm-powered packages as dependencies, a fews months ago. Perhaps it was this.

arcanis avatar Apr 19 '21 13:04 arcanis

It would seem this is causing checksum mismatch bugs when installing git dependencies as they get packed instead of used as it so it's pretty severe as it's causing everyone to ignore or update checksum mismatches.

mpetrunic avatar May 09 '22 15:05 mpetrunic

Yeah, that's the workaround. I have a git dependency ("package": "git+ssh://[email protected]/foo/bar.git") in my package.json and if I run yarn install on Windows and commit yarn.lock then I have to run the CI (runs on Linux) with YARN_CHECKSUM_BEHAVIOR=ignore otherwise I get The remote archive doesn't match the expected checksum.

AndreKR avatar Aug 09 '22 18:08 AndreKR

Any updates? We're being bit by issue https://github.com/yarnpkg/berry/issues/5136, which is a result of this issue.

KholdStare avatar Mar 13 '23 18:03 KholdStare