berry icon indicating copy to clipboard operation
berry copied to clipboard

[Bug?]: checksum is different from windows and unix for local packages

Open Diggsey opened this issue 1 year ago • 20 comments
trafficstars

Self-service

  • [x] I'd be willing to implement a fix

Describe the bug

Yarn computes a different checksum for local packages (ie. packages installed via a relative path) on windows vs linux, causing yarn install to fail in CI.

To reproduce

  • Run yarn add ../relativePath followed by yarn install on windows.
  • Commit changes and attempt to run yarn install in CI on linux.

Run yarn install
➤ YN0000: · Yarn 4.0.2
➤ YN0000: ┌ Resolution step
Resolution step
➤ YN0000: └ Completed in 0s 566ms
➤ YN0000: ┌ Post-resolution validation
Post-resolution validation
➤ YN0028: -  resolution: "platformed-browser-api@file:../browser-api#../browser-api::hash=516e63&locator=platformed-frontend%40workspace%3A."
➤ YN0028: -  checksum: f4cc44353a17885d87d70800a9a43b4d88bc68aa81c1ae9177775ef4fe5f47ecc2a469e21f3615d3bd08f7b83c3e9e7b7980ecdcc8e76ff069f1c5bf7487deab
➤ YN0028: +  resolution: "platformed-browser-api@file:../browser-api#../browser-api::hash=cab6bb&locator=platformed-frontend%40workspace%3A."
➤ YN0028: The lockfile would have been modified by this install, which is explicitly forbidden.
➤ YN0000: └ Completed
➤ YN0000: · Failed with errors in 0s 695ms
Error: Process completed with exit code 1.

Environment

System:
    OS: Windows 10 10.0.19045
    CPU: (20) x64 12th Gen Intel(R) Core(TM) i9-12900H
  Binaries:
    Node: 18.12.1 - ~\AppData\Local\Temp\xfs-2d93d2cb\node.CMD
    Yarn: 4.0.2 - ~\AppData\Local\Temp\xfs-2d93d2cb\yarn.CMD
    npm: 9.2.0 - C:\Program Files\nodejs\npm.CMD

Additional context

Related issues: https://github.com/yarnpkg/berry/issues/5136 https://github.com/yarnpkg/berry/issues/2774

Diggsey avatar Jan 25 '24 11:01 Diggsey

Please attach the generated files on Windows and Linux

arcanis avatar Jan 25 '24 11:01 arcanis

I'm not the originator, but I am encountering the same issue when installing on Mac vs Windows. Here's some info I hope helps you resolve it:

I have a local package, added as a file: entry. The .zip file in the cache has a different checksum on the 2 OS's, from the looks of it because of the embedded CR/LF chars on Windows, and directory permissions.

Mac:

project.js # zipinfo /Users/me/.yarn/berry/cache/mySDK-file-9da22a02b2-10c0.zip
Archive:  /Users/me/.yarn/berry/cache/mySDK-file-9da22a02b2-10c0.zip
Zip file size: 330384 bytes, number of entries: 4
drwxr-xr-x  6.3 unx        0 b- stor 84-Jun-22 21:50 node_modules/
drwxr-xr-x  6.3 unx        0 b- stor 84-Jun-22 21:50 node_modules/mySDK/
-rw-r--r--  6.3 unx   329745 b- stor 84-Jun-22 21:50 node_modules/mySDK/myProject.js (a webpack bundle)
-rw-r--r--  6.3 unx       95 b- stor 84-Jun-22 21:50 node_modules/mySDK/package.json
4 files, 329840 bytes uncompressed, 329840 bytes compressed:  0.0%

Windows: (Git Bash)

project.js # zipinfo /c/Users/me/AppData/Local/Yarn/Berry/cache/mySDK-file-a940f94840-10c0.zip
Archive:  /c/Users/me/AppData/Local/Yarn/Berry/cache/mySDK-file-a940f94840-10c0.zip
Zip file size: 330383 bytes, number of entries: 4
drwxr-xr-x  6.3 unx        0 b- stor 84-Jun-22 21:50 node_modules/
drw-r--r--  6.3 unx        0 b- stor 84-Jun-22 21:50 node_modules/mySDK/
-rw-r--r--  6.3 unx   329745 b- stor 84-Jun-22 21:50 node_modules/mySDK/myProject.js
-rw-r--r--  6.3 unx       94 b- stor 84-Jun-22 21:50 node_modules/mySDK/package.json
4 files, 329839 bytes uncompressed, 329839 bytes compressed:  0.0%

Differences:

  • node_modules/mySDK directories have different file permissions (755 vs 644)
  • the size of the package.json differs due to CR/LF in the Windows version:
project.js (mac) # cat -e node_modules/mySDK/package.json
{$
    "name": "mySDK",$
    "main": "./mySDK.js",$
    "version": "1.0.0"$
}$

project.js (windows) # cat -e node_modules/mySDK/package.json
{^M$
    "name": "mySDK",^M$
    "main": "./mySDK.js",^M$
    "version": "1.0.0"^M$
}^M$ 

Seems it could be resolved by stripping out control characters when zipping?

brenthompson avatar Jan 30 '24 16:01 brenthompson

Did you per chance configure git on Windows to automatically convert line returns?

arcanis avatar Jan 30 '24 16:01 arcanis

Did you per chance configure git on Windows to automatically convert line returns?

Not that I'm aware of, I have Git Bash installed on many Windows machines, all of them using default installation settings. I can check.

brenthompson avatar Jan 30 '24 16:01 brenthompson

It was set on Windows, and changing it made a difference, but I still get the checksum mismatch.

Mac:

project.js # git config --get core.autocrlf 
project.js # // nothing

project.js # git ls-files packages/mySDK/vendor/* --eol
i/crlf  w/crlf  attr/                 	packages/mySDK/vendor/mySDKproject.js
i/lf    w/lf    attr/                 	packages/mySDK/vendor/package.json

Windows:

project.js # git config --get core.autocrlf
true     // must be a git bash default

project.js # git ls-files packages/mySDK/vendor/* --eol
i/crlf  w/crlf  attr/                   packages/mySDK/vendor/mySDKproject.js
i/lf    w/crlf  attr/                   packages/mySDK/vendor/package.json     <--- w/ value is different

project.js #

I added a .gitattributes file:

project.js # cat .gitattributes
packages/mySDK/vendor/package.json eol=lf

Then ran

git rm --cached -r .
git reset --hard

per https://www.aleksandrhovhannisyan.com/blog/crlf-vs-lf-normalizing-line-endings-in-git/

Windows again:

project.js # git ls-files packages/mySDK/vendor/* --eol
i/crlf  w/crlf  attr/                   packages/mySDK/vendor/mySDKproject.js
i/lf    w/lf    attr/text eol=lf        packages/mySDK/vendor/package.json    <---- w/ value now matches

project.js # cat -e node_modules/mySDK/package.json
{$
    "name": "mySDK",$
    "main": "./mySDKproject.js",$
    "version": "1.0.0"$
}$

I committed the .gitattibutes file, and ran yarn --check-cache on both Mac and Windows. yarn.lock on Mac didn't change, the one on Windows did, i.e. the checksums are still different.

Here are verbose zipinfo outputs for comparison. Again the differences are the directory permissions and number of bytes in the package.json file. zipinfo_windows.txt zipinfo_mac.txt

brenthompson avatar Jan 30 '24 18:01 brenthompson

Did you per chance configure git on Windows to automatically convert line returns?

I do not have this option set and have the issue.

I believe the permissions are the cause.

Diggsey avatar Jan 31 '24 01:01 Diggsey

We're having this problem also. It's causing quite a headache when Windows Devs are generating different lockfiles to those on UNIX.

TomppaPackage avatar Feb 11 '24 13:02 TomppaPackage

Folks, isn't it about checksums of packages being calculated on the compressed versions of packages vs the "raw" packages from npm? See my issue from some time ago where I learned about this: https://github.com/yarnpkg/berry/issues/5957

Try setting compression level to 0 in the project -- maybe the differences are due to how the compression algorithm works on various OS?

akwodkiewicz avatar Feb 15 '24 11:02 akwodkiewicz

Folks, isn't it about checksums of packages being calculated on the compressed versions of packages vs the "raw" packages from npm? See my issue from some time ago where I learned about this: #5957

Try setting compression level to 0 in the project -- maybe the differences are due to how the compression algorithm works on various OS?

Thanks for the suggestion, I hadn't seen that bug. But 1) I'm not using a package from npm - it's a simple file: entry consisting of a webpack bundle + package.json, 2) my issue didn't occur after upgrading from yarn 3 to 4, we've been on v4 all along, 3) our compressionLevel was already set to 0 everywhere

And P.S. I agree this is highly annoying, seems to be pretty widespread, and so I'm baffled as to why the maintainers are ignoring it.

brenthompson avatar Feb 21 '24 13:02 brenthompson

Please attach the generated files on Windows and Linux

@arcanis kindly remove the 'waiting for feedback' tag, data has been provided

brenthompson avatar Feb 21 '24 14:02 brenthompson

Hi! 👋

It seems like this issue as been marked as probably resolved, or missing important information blocking its progression. As a result, it'll be closed in a few days unless a maintainer explicitly vouches for it.

yarnbot avatar Mar 22 '24 15:03 yarnbot

Bad @yarnbot

Diggsey avatar Mar 22 '24 15:03 Diggsey

Yeah, this is definitely not resolved. We had to move one of our repos to npm because of it.

ezweave avatar Mar 22 '24 15:03 ezweave

How do we get a "maintainer to explicitly vouch for it"? Do we have to wave arms in the comments until it draws attention? :')

ClementValot avatar Mar 22 '24 15:03 ClementValot

Hi! 👋

It seems like this issue as been marked as probably resolved, or missing important information blocking its progression. As a result, it'll be closed in a few days unless a maintainer explicitly vouches for it.

yarnbot avatar Apr 21 '24 17:04 yarnbot

Very bad @yarnbot

Diggsey avatar Apr 21 '24 18:04 Diggsey

Sadly this is breaking for any team that uses both Windows and Unix/MacOS, the only workaround I've found is having checkSumBehavior: ignore in yarnrc and that's too big a trade-off in security :(

@arcanis Maybe we can have a bit of reassurance that it's in someone's scope? The waiting for feedback tag is still on even though that's been addressed

ClementValot avatar Apr 22 '24 10:04 ClementValot

Sorry, this thread fell of the radar. Yarn will pack file: and git: packages, and their content needs to be the same for the checksum to pass. If the content isn't the same, then we don't know for sure whether it's inconsequential or a problem that puts your application in jeopardy.

Unfortunately, the way the packages are built may depend from your systems, and that makes this process flaky. We're always looking for ways to improve that, but it's unclear right now what the solution should be.

For example, in the case of the OP the problem was about CRLF strings. Should Yarn normalize them during packing? Should it do that on all files? Probably that should exclude binary files? What if a project expects a CRLF for X or Y reason? If we can't do it safely, should we do it at all?

That said, perhaps we could at least make a better job at highlighting the issues:

  • Detect when the Git configuration would lead to such issues
  • Detect what's the actual difference and suggest potential remediations (rather than just a failed checksum)

arcanis avatar May 06 '24 09:05 arcanis

@arcanis Since this only affects file: and git: packages, Yarn could use a different mechanism for computing the hash: for git packages, the commit hash already identifies the package content, and for local files, you could use git to compute a hash in the same way.

Alternatively, Yarn could hash the ZIP file in a way that excludes the permissions metadata from the computed hash, avoiding the problem with permissions, and then have better diagnostics for CRLF/LF differences (which should in priniciple be fixable by the user, unlike the permissions issue).

A final option would be to ignore or store multiple hashes for packages where a deterministic hash cannot be easily computed.

Diggsey avatar May 06 '24 12:05 Diggsey

for git packages, the commit hash already identifies the package content

+1, but note: validation by commit hash requires git clone because the commit object (which has the tree hash) is not part of github archives

the way the packages are built may depend from your systems, and that makes this process flaky. We're always looking for ways to improve that, but it's unclear right now what the solution should be.

sounds like yarn is trying to re-invent nix

also with nix, reproducible builds are hard because "bad" packages can introduce non-determinism in the build process

so by default, nix packages are "input addressed": all source files and all build scripts are reduced to one hash and that input hash identifies pre-compiled packages in a binary cache

Trustix - Consensus and voting

So we lean into this, and allow each user to define what consensus means to them, fully scriptable in Lua. This is especially well suited to Trustix for two reasons. First, unlike other systems we’ve discussed, it is not essential to reach a consensus in Trustix. If every builder reports a different output hash, the user can simply build that package from source.

Content-addressed Nix − call for testers

what happens if __contentAddressed = true is used when the derivation is not reproducible (results in a different output contents across builds)?

It all depends of the exact scenario, but in the simplest case where there’s only one source of truth (either you’re only building locally, or there’s only one binary cache that feeds everything else), it’ll work mostly as input-addressed derivations, in that the first build will be accepted as the “truthful” build, and Nix won’t even try to rebuild it (why should it after all?)

What factors affect the reproducibility of Nix builds?

milahu avatar May 14 '24 09:05 milahu