toolkit
toolkit copied to clipboard
cache trying to use git for windows' tar.exe on self-hosted runner, failing to find correct gzip
On a self-hosted Windows runner, I just noticed that caches are failing now with errors like:
"C:\Program Files\Git\usr\bin\tar.exe" --posix -cf cache.tgz --exclude cache.tgz -P -C C:/runner/_work/msys2-autobuild/msys2-autobuild --files-from manifest.txt --force-local -z
/bin/sh: line 1: gzip: command not found
/usr/bin/tar: cache.tgz: Cannot write: Broken pipe
/usr/bin/tar: Child returned status 127
/usr/bin/tar: Error is not recoverable: exiting now
Warning: Failed to save: "C:\Program failed with error: The process 'C:\Program Files\Git\usr\bin\tar.exe' failed with exit code 2
Caches used to work fine, using the version of tar.exe in system32. My understanding is that the programs in git for windows usr\bin are supposed to be regarded as internal to Git for Windows and should not be relied upon by external things like this. @dscho?
Current theory is that Git for Windows' tar is looking on the path for gzip, either not finding it in my case, or in other cases finding one incompatible with that tar (due to being based on a different Cygwin).
Maybe it is failing to find gzip because git for windows' usr\bin is not on my path? But it really shouldn't be on the path.
I just came to report what is, I suspect, the same bug. Do you have Cygwin installed on these runners?
The problem I'm seeing is that Git for Windows is based on MSYS2, which is downstream from Cygwin, and in particular is based on Cygwin DLLs that are a few years old. If you have a more recent regular Cygwin installation on the same system, and it's in the PATH, the caching code will still run the GfW tar, but that will try to link against the Cygwin DLL from your Cygwin installation, and promptly fall over with the error you're seeing.
Assuming it is the same problem, I've just created a simple testcase using GitHub hosted runners, at https://github.com/me-and/repro-cygwin-cache-woe.
I don't have cygwin on the path, but I could see where that could cause problems too.
What's worse, we just discovered this over at MSYS2: https://github.com/actions/toolkit/blob/6c1f9eaae833355a0b212b66c5f2e3ac366de185/packages/cache/src/internal/tar.ts#L17
This exports the variable for all subsequent steps too, switching the default behavior of any MSYS2 used later in the workflow.
So there are at least two pretty major issues with this change:
- tar is looking on the path for the compression program, which may not be on the path, or may be an incompatible variant (such as one from a Cygwin or different MSYS2 install).
- the code here is setting (and/or potentially overriding an existing setting) an environment variable globally, affecting all future actions unexpectedly (based on whether something used a cache prior to those steps)
What's worse, we just discovered this over at MSYS2:
https://github.com/actions/toolkit/blob/6c1f9eaae833355a0b212b66c5f2e3ac366de185/packages/cache/src/internal/tar.ts#L17
@Phantsure it would be nice if an alternative to this could be found. There is no good reason to set global env vars and change the default behavior for all MSYS2 users.
I've created a separate issue for the env var issue, to not sidetrack this issue: https://github.com/actions/toolkit/issues/1312
The problem I'm seeing is that Git for Windows is based on MSYS2, which is downstream from Cygwin, and in particular is based on Cygwin DLLs that are a few years old.
No, MSYS2 is based on the latest Cygwin.
"A few years" was an overstatement (I think I got confused by the the calendar turning over, and overcompensated), but I believe Git for Windows is, as of v2.38.0.windows.1, based on Cygwin 3.3.6, while the latest upstream Cygwin release is v3.4.3.
Downgrading to use Cygwin v3.3.6 gets the tar call working, so I'm pretty sure that the Cygwin DLL compatibility is the problem (although I don't think just switching library versions is the correct solution).
Git for Windows is based on MSYS2,
correct
which is downstream from Cygwin,
correct
and in particular is based on Cygwin DLLs that are a few years old.
Incorrect. Your information is more than 7 years old; Git for Windows v1.x was based on MSys which indeed was forked off of an ancient Cygwin version. But MSYS2 frequently updates to the latest Cygwin; Git for Windows lags behind a little to take off the edge of some of the more fragile developments, at the moment Git for Windows uses a derivative of Cygwin v3.3.6 (which was released on September 6th 2022, hardly "several years old").
the caching code will still run the GfW tar, but that will try to link against the Cygwin DLL from your Cygwin installation
That, too, is incorrect. Git for Windows' tar.exe links to msys-2.0.dll and will never "magically" link to cygwin1.dll. In other words, it will never pick up the DLL from the Cygwin installation.
So how about the MSYS2 installation that's in C:\msys64 on hosted runners? Git for Windows' tar.exe won't use C:\msys64\usr\bin\msys-2.0.dll either because there is a msys-2.0.dll right next to tar.exe, in C:\Program Files\Git\usr\bin, and that will be used.
My understanding is that the programs in git for windows usr\bin are supposed to be regarded as internal to Git for Windows and should not be relied upon by external things like this. @dscho?
I tried to make the case and I even almost got a change accepted to prevent the internal tools of Git for Windows to be picked up. However, too many users already relied on them tools to be in the PATH and we could not merge that change. I've made my peace with it.
OK, in that case this bug is that it is trying to find a gzip.exe on the path (rather than right next to tar.exe), and either not finding one or finding one that doesn't work with that tar.exe. My theory for the cygwin situation is that the code for starting a cygwin process from a cygwin process was getting confused, rather than actually trying to load the wrong cygwin/msys dll.
@dscho you're obviously correct about versions; I'd realised I'd vastly overstated the age of Git for Windows' builds, and corrected myself.
The thing that pointed me at Git for Windows / Cygwin compatibility is having run the same tests on GitHub runners using old Cygwin versions: when I built using an old Cygwin archive, the cache action worked. When I build using an archive that used v3.4.0, the cache action fails.
@jeremyd2019 gave me an idea that does work: if I remove Cygwin's zstd from the PATH before running the caching action, things seem to work. This seems baffling to me: why would Git for Windows tar be able to call Cygwin zstd when the cygwin1.dll is from 3.3.6, but not when it's from 3.4.0? The version of zstd doesn't change between the two, only the cygwin1.dll version.
I've demonstrated this behaviour with a test run at https://github.com/me-and/repro-cygwin-cache-woe/actions/runs/3990249408. This is a matrix test, with the following variables:
- Should Cygwin be added to the system PATH?
- Should Cygwin's
zstdbe renamed before running the cache action? - What Cygwin mirror should be used, between 3 Dec 2022 (Cygwin v3.3.6), 4 Dec 2022 (Cygwin v3.4.0) or the latest from mirrors.kernel.org (currently v3.4.5).
When either (a) Cygwin isn't in the PATH, or (b) Cygwin is in the PATH but Cygwin's zstd isn't, everything works reliably. When using the the v3.4.x builds, I see the "Broken pipe" error. When using older Cygwin builds, I'm fairly sure I've seen it Just Work™ previously, but I'm now seeing a different error: "0 [main] zstd (2092) C:\cygwin\bin\zstd.exe: *** fatal error - cygheap base mismatch detected - 0x210351408/0x18034C408. // This problem is probably due to using incompatible versions of the cygwin DLL."
this bug is that it is trying to find a gzip.exe on the path
@jeremyd2019 I guess it depends on your point of view whether you consider this a bug. The cache library clearly expects tar to be in the PATH, why not also gzip? The easiest fix for you might be to set your runner up with a gzip in the PATH.
I've demonstrated this behaviour with a test run at https://github.com/me-and/repro-cygwin-cache-woe/actions/runs/3990249408.
Curious. Thank you for the record, that's very helpful. In https://github.com/me-and/repro-cygwin-cache-woe/actions/runs/3990249408/jobs/6843755788#step:5:8 you clearly see the problem: C:\Program Files\Git\usr\bin\tar.exe tries to call Cygwin's zstd.exe and then runs into the woes where it detects a Cygwin heap mismatch.
What throws me is that I seem to remember that we go out of our way in the MSYS2 runtime to differentiate enough from the Cygwin runtime so that the MSYS2 runtime's heap is not mistaken for a Cygwin runtime heap.
However, the Cygwin runtime startup code run as part of C:\cygwin\bin\zstd.exe's startup clearly misidentifies the MSYS2 runtime heap for a Cygwin heap. Maybe I misremember? Well, let's see.
clicketyclick
Half an hour later, I am even more puzzled than before. We do call hook_or_detect_cygwin() to detect whether the spawned executable is a Cygwin (or in MSYS2's case, and MSYS) program. And there, we clearly look for the correct DLL (and should not pick up executables linking to "the other DLL"): https://github.com/msys2/msys2-runtime/blob/108a4aca5610d4e4d74caaa65fce3342e36fd10e/winsup/cygwin/hookapi.cc#L378-L387
this bug is that it is trying to find a gzip.exe on the path
@jeremyd2019 I guess it depends on your point of view whether you consider this a bug. The
cachelibrary clearly expectstarto be in thePATH, why not alsogzip? The easiest fix for you might be to set your runner up with agzipin thePATH.
So this bit isn't true, at least as of the latest versions of the library, which are hard-coded to use Git for Windows' tar, despite then not specifying a path for zstd.
Possibly at least part of the fix here is for the action to use an explicit, hard-coded path to zstd or gzip, at least when it's also using an explicit hard-coded path to tar.
So this bit isn't true, at least as of the latest versions of the library, which are hard-coded to use Git for Windows' tar, despite then not specifying a path for zstd.
Possibly at least part of the fix here is for the action to use an explicit, hard-coded path to zstd or gzip, at least when it's also using an explicit hard-coded path to tar.
I think their intent is to look for zstd on the PATH, I don't think GfW includes a /usr/bin/zstd - it doesn't really have a reason to need it, and that /usr/bin is really just for things necessary for git. I think they intend to find the zstd from https://github.com/actions/runner-images/blob/main/images/win/scripts/Installers/Install-Zstd.ps1 on the PATH and use that.
I also think it's kind of silly to be sitting here guessing what their intentions are in an issue on their repository, that no developer has commented on. I don't know if they're stumped and hoping we puzzle out a solution, or if they just don't care about these cases.
So this bit isn't true, at least as of the latest versions of the library, which are hard-coded to use Git for Windows' tar, despite then not specifying a path for zstd. Possibly at least part of the fix here is for the action to use an explicit, hard-coded path to zstd or gzip, at least when it's also using an explicit hard-coded path to tar.
I think their intent is to look for zstd on the
PATH, I don't think GfW includes a/usr/bin/zstd- it doesn't really have a reason to need it, and that/usr/binis really just for things necessary for git. I think they intend to find the zstd from https://github.com/actions/runner-images/blob/main/images/win/scripts/Installers/Install-Zstd.ps1 on thePATHand use that.
GfW doesn't have zstd, no, and I'm pretty sure you're right that the code is going to be picking up zstd from that script on GitHub hosted runners at least. I suspect there isn't much deliberate intent here, though, and an appropriate and effective solution would be to hard-code the path to that zstd executable, just as the path to the tar executable is hardcoded.
If nobody else gets to it, I'll look at writing up a patch to do that when I get a chance. Although "when I get a chance" is probably not going to be until next month at the earliest, and I'd be very happy for someone else to do the work…
What throws me is that I seem to remember that we go out of our way in the MSYS2 runtime to differentiate enough from the Cygwin runtime so that the MSYS2 runtime's heap is not mistaken for a Cygwin runtime heap.
However, the Cygwin runtime startup code run as part of
C:\cygwin\bin\zstd.exe's startup clearly misidentifies the MSYS2 runtime heap for a Cygwin heap. Maybe I misremember? Well, let's see.clicketyclick
Half an hour later, I am even more puzzled than before. We do call
hook_or_detect_cygwin()to detect whether the spawned executable is a Cygwin (or in MSYS2's case, and MSYS) program. And there, we clearly look for the correct DLL (and should not pick up executables linking to "the other DLL")
I believe that I have identified the issue and implemented a viable work-around in https://github.com/git-for-windows/msys2-runtime/pull/48. If you want to verify this claim, please install a Git for Windows snapshot instead of the official Git for Windows release on the runner (snapshots are very similar to official Git for Windows releases, they are even code-signed by me, the only thing setting them apart is that the snapshots have "funny" version numbers reported by git version).
@dscho I'll check this as soon as I can, thank you! There's probably going to be some delay – I've had some urgent personal issues come up that have taken priority over a lot of my life – but it's on my to-do list.
For the record, the original issue I reported still occurs with GfW 2.40.0.windows.1 installed. I did not expect that to help, because in this case the error is not the result of finding another "cygwin" gzip.exe, but in not finding a gzip.exe on the PATH at all. In this case it really should look next to tar.exe for gzip.exe.
@jeremyd2019 since the tar.exe we're talking about comes from MSYS2, and since you're a frequent contributor to that project, how about teaching that tar package the trick to append the directory containing tar.exe to the PATH when searching for gzip.exe (or for that matter, any (de-)compressor)?
does anybody have fixed this warning with gzip in their self-hosted servers with Windows?
the reason this doesn't happen on the github-hosted runners I think is because they have a zstd.exe on the PATH. Doing that would probably take care of it. What's buggy is the supposed fallback case when zstd is not present and it should use gzip, it doesn't take into account that GNU tar looks for gzip on the PATH too, rather than being linked in like in the bsdtar that it used to use from Windows\System32. There is a gzip.exe, right next to the tar.exe this code took the effort to track down inside Git for Windows, but it is not on the PATH, they should pass the full path to gzip.exe to tar most likely.
Footprint
Any reliable workarounds?
if I cared about the cache, I'd put zstd.exe on the PATH on my runner
if I cared about the cache, I'd put zstd.exe on the
PATHon my runner
it worked for me. Thanks!
Update: cache, created on the self-hosted windows runner with mentioned zstd hack, cannot be used somehow. When I try to retrieve it, I get Error: Failed to restore cache entry. Exiting as fail-on-cache-miss is set
Switching to the cloud image. e.g. windows-latest fixes the issue.
if I cared about the cache, I'd put zstd.exe on the
PATHon my runnerit worked for me. Thanks!
What is the correct path for zstd.exe? @xv-aleksandr-b
if I cared about the cache, I'd put zstd.exe on the
PATHon my runnerit worked for me. Thanks!
What is the correct path for zstd.exe? @xv-aleksandr-b
I've just downloaded the latest version from the FB repo, put it in the custom folder and added to the PATH env variable.
Update: cache, created on the self-hosted windows runner with mentioned zstd hack, cannot be used somehow. When I try to retrieve it, I get
Error: Failed to restore cache entry. Exiting as fail-on-cache-miss is setSwitching to the cloud image. e.g.windows-latestfixes the issue.
Currently facing the same issue, has anyone been able to get cache working on a self hosted runner?
Update: cache, created on the self-hosted windows runner with mentioned zstd hack, cannot be used somehow. When I try to retrieve it, I get
Error: Failed to restore cache entry. Exiting as fail-on-cache-miss is setSwitching to the cloud image. e.g.windows-latestfixes the issue.Currently facing the same issue, has anyone been able to get cache working on a self hosted runner?
🤦 restarting the runner fixed the issue
It looks like there's been no progress here, plain windows runner with git, no cygwin or msys installation other than the one that comes with git. The action does not work with the symptom reported above. Is this action supported on windows self-hosted runners? If not I can make a documentation PR to make that clear?
The workaround seems to be to install zstd from https://github.com/facebook/zstd/releases/tag/v1.5.6 and put it somewhere on the path. Perhaps its worth documenting that if there's no appetite to support the platform?