`gix-command` on Windows runs shell commands in non-POSIX mode
Current behavior 😯
Background
Git commands that run in a shell are meant to run in a POSIX-compatible sh. This can be, and usually is, a shell that is more specifically known by some other name and that extends and even breaks with the requirements of POSIX for sh. But when run as sh, such a shell behaves in a POSIX-compatible manner. This is true of bash, which provides sh in Git for Windows environments. (Some shells, including bash, enter POSIX mode only after running commands from startup scripts. Some shells also do not behave in a completely POSIX-compatible way even when in POSIX mode. Neither of those caveats relates to this issue.)
Therefore, even when sh and bash are same due to being equivalent symlinks, hard links, or duplicate files, running sh runs a shell in POSIX mode. This is, broadly speaking, the case even on Windows: for example, when bash.exe and sh.exe are shell executables with the same contents, running bash.exe defaults to non-POSIX mode and running sh.exe defaults to POSIX mode.
But this is not quite true when the executable being run is not really the shell itself but instead a shim that runs the shell. Then what matters is the name that the shim runs the shell under. This is not a special rule, but just a consequence of the above: the shim, after all, is a separate program running the shell. Ordinarily, this would not be a problem. Non-shim bash.exe and sh.exe shells--which could be copies, symlinks, or hard links--could be run by separate similar but nonidentical bash.exe and sh.exe shim executables. In this approach, the bash.exe shim would delegate to the non-shim bash.exe, and the sh.exe would delegate to the non-shim sh.exe.
The problem
The trouble is that the (git root)\bin\bash.exe and (git root)\bin\sh.exe shims found in full non-SDK installations of Git for Windows (including portable installations) do not work this way. They are equivalent: both delegate to (git root)\usr\bin\bash.exe, neither to (git root)\usr\bin\sh.exe. At least in the Portable Git installations I tested--and scoop installations, but that is a repackaging of Portable Git--they are separate files, and they are not hard links to the same thing, but they have identical contents:
C:\Users\ek> cd C:\Users\ek\scoop\apps\git\2.48.1\bin
C:\Users\ek\scoop\apps\git\2.48.1\bin> ls
Directory: C:\Users\ek\scoop\apps\git\2.48.1\bin
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 2/13/2025 6:13 AM 46992 bash.exe
-a--- 2/13/2025 6:13 AM 46472 git.exe
-a--- 2/13/2025 6:13 AM 46992 sh.exe
C:\Users\ek\scoop\apps\git\2.48.1\bin> fsutil hardlink list bash.exe
\Users\ek\scoop\apps\git\2.48.1\bin\bash.exe
C:\Users\ek\scoop\apps\git\2.48.1\bin> fsutil hardlink list sh.exe
\Users\ek\scoop\apps\git\2.48.1\bin\sh.exe
C:\Users\ek\scoop\apps\git\2.48.1\bin> (Get-FileHash bash.exe).Hash
2F8D7CB8CA7DF3F11985409B73C273C424272B0E6D648E58C178B3D462E942F9
C:\Users\ek\scoop\apps\git\2.48.1\bin> (Get-FileHash sh.exe).Hash
2F8D7CB8CA7DF3F11985409B73C273C424272B0E6D648E58C178B3D462E942F9
Why that makes gix-command behave subtly wrong
When git runs shell commands, it does not use those shims, so it uses its sh as such. That is, it invokes its non-shim sh.exe, causing it to run in POSIX mode. But when gix-command runs a command with a shell due to use_shell being set to true, it runs sh:
https://github.com/GitoxideLabs/gitoxide/blob/d0ef276302aad10fda222236c69e78e580215f31/gix-command/src/lib.rs#L283-L286
On systems that have a usable sh that can be found in such a PATH search, that will often be the (git root)\bin\sh.exe shim associated with Git for Windows, since many users of gitoxide on Windows--and more broadly of tools on Windows that operate on Git repositories, some of which may use gitoxide library crates--will have Git for Windows installed with that directory in their PATH.
As described above, this shim is called sh but it is really a shim for bash. It runs a bash shell called bash with argv[0] set to bash. The resulting shell instance does not enter POSIX mode, even though, from the perspective of gix-command, it ran sh.
But we may need to use the shim
This issue was not a motivation for #1862. But as originally envisioned, that PR would have fixed this. One of its changes is to replace the above code with:
https://github.com/GitoxideLabs/gitoxide/blob/65f706b256cdee068d0be64b1bcc3957332e307c/gix-command/src/lib.rs#L283-L284
Where the implementation of gix_path::env::shell() is also changed, but in the original vision of #1862 was intended to continue using the non-shim sh.exe in Git for Windows instead of the shim.
Using the non-shim sh.exe would fix this issue. But does not seem to be a reasonable thing to do without further environment customization to account for the absence of the shim's functionality. Such customization may be possible, but I think it is beyond the scope of #1862. When not using a shim, some environment variables--including PATH directories with expected tools--may be absent or set to unusable values.
The shim helps avoid running wrong tool executables
Such a shell may even pick up executables that link to msys-2.0.dll from a different MSYS2 installation from the one the shell itself uses which. Unlike most Windows programs, MSYS2 programs that use one msys-2.0.dll can have problems running other MSYS2 programs that use another msys-2.0.dll or a different version of build, even when all executables and DLLs are in safe locations and all executables load the correct DLLs. This is documented for Cygwin.
I am unsure if it is generally as much of a problem in MSYS2, which does not seem to document it as something to be concerned about. The strange error currently blocking #1862 turns out to be such a case, though it is subtler and weirder than the examples given in that FAQ entry, and it may be unknown and I think may even be considered a bug in MSYS2. I'll give full details at #1862 soon (edit: https://github.com/GitoxideLabs/gitoxide/pull/1862#issuecomment-2692158831); this fragment is so that abandoning shims as a way to fix this issue is not rushed into in the future without awareness of the risks.
Expected effect of #1862
Both for the general reason about PATH and other environment variables, and in view of the specific problem encountered already, I think the way forward in #1862 will be to prefer the shim.
Thus it will not solve the problem described in this issue, and will even somewhat exacerbate it by making gix-command use the Git for Windows sh.exe shim (which is a shim for bash.exe) if present, even when another sh.exe would be found in a PATH search.
Because the actual non-shim to shim change will be in gix_path::env::shell(), this issue will also be exacerbated in the sense that it will apply to any other uses of shell() that do not take steps to mitigate it (such as those suggested below).
It seems to me that this issue is much less severe than the problems of having an insufficient or malfunctioning environment, and that it is justified to exacerbate this issue in that way. But #1862 is one of my motivations for opening this, so that it is known.
Expected behavior 🤔
When gix-command uses sh from Git for Windows, it should behave as sh does in Git for Windows when git runs it, running it in POSIX mode as sh does. See "Git behavior" and "Steps to reproduce" below for a verification of the difference and a demonstration of how they currently behave differently.
Possible solutions I don't think will work well
It would be nice to have an executable that, when run, defaults to running the shell in POSIX mode.
Setting an environment variable like POSIXLY_CORRECT should be avoided here, since it would be inherited by non-subshell child processes of the shell and potentially affect their behavior.
We can probably pass -o posix on Windows. I am not sure if there are any major problems of this, but I think there are some notable problems:
- We must only do it when the shell is not customized with
shell_program(). But then a value ofshforshell_program()causes what would already have behaved asshnot to behave likesh, which is extremely unintuitive. If it is special-cased to include values likeshpassed toshell_program()then we suffer the opposite but comparably bad effect of not usingshlike it works when it is run straightforwardly. - If this is done, it should probably only be done when we are running the
sh.exeshim associated with Git for Windows, not any othersh.exe. This will complicate the implementation, and potentially result in greater coupling of implementation details betweengix-pathandgix-command, since whether-o posixis to be passed ingix-commandwould be determined by information obtained ingix-path. - Limiting it in that way would also not cause other
sh.exeshims that are shims for that shim to still have the undesirable behavior. For example, whengitis installed throughscoop, ash.exeis placed in abindirectory in thePATHthat is a scoop shim for the Git for Windowssh.exeshim that is actually a shim for itsbash.exe.
So I would like to avoid that approach if possible. This leaves two other clear alternatives, and maybe others I haven't thought of.
Possible solutions I think may work well
First, maybe this is just a bug in Git for Windows. If not, then it is presumably due to an unfortunate circumstance such that having the shim one would intuitively expect would not do the right thing, which would be useful to know about because Git for Windows should probably document that somewhere (such as in its wiki) and since the underlying cause might potentially apply to gitoxide in some way.
If it is a bug in Git for Windows, then fixing it there would also fix it here. I believe that, unlike some other installations of git, it is rare (and inadvisable) to continue using very old versions of Git for Windows, since as far as I know there are no further-downstream builds analogous to those in operating system distributions like Debian that (roughly speaking) fix security bugs while leaving non-security bugs alone.
If it is not a bug in Git for Windows, then we can try to run the non-shim executable and do our own environment modifications. Due to how process creation on Windows is slower than on Unix-like operating systems, running all commands through a shim should perhaps be avoided anyway. But whether done to fix this issue or for performance (or greater versatility), I think that is something that would be easy to get wrong and should be done very carefully. In particular, every version of Git for Windows has a chance to ship shims that work differently to account for changes in other parts of Git for Windows. In contrast, gitoxide has no such versioned coupling to the Git for Windows shims.
Git behavior
As noted above, git runs sh in such a way that, from the shell's perspective, it is really run as sh as behaves as such.
This can be demonstrated on Windows, in PowerShell. First, I ran these commands to create git repository that will display information that distinguishes both the status and some of the effects of POSIX mode for the shell that runs it, when a fetch operation is performed in the repository:
git init what-git-shell
cd what-git-shell
git remote add origin ssh://localhost/repo.git
git config core.sshCommand 'exec >&2; ps | grep -E "^\s*(PID|$$)\b"; echo "BASH=$BASH"; echo "SHELLOPTS=$SHELLOPTS"; export -p | head -n1; :'
The command is somewhat complicated by the ps in Git for Windows not supporting PID arguments for filtering, and also by the inability to embed newlines in the command without changing its interpretation (even though that would work in various related cases).
Then I ran git fetch, which printed this, where the error message is itself no problem (my custom SSH command does not attempt to actually be usable for fetching):
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
PID PPID PGID WINPID TTY UID STIME COMMAND
307 1 307 36120 cons1 197609 21:04:33 /usr/bin/sh
BASH=/usr/bin/sh
SHELLOPTS=braceexpand:hashall:interactive-comments:posix
export ALLUSERSPROFILE="C:\\ProgramData"
The relevant part is:
PID PPID PGID WINPID TTY UID STIME COMMAND
307 1 307 36120 cons1 197609 21:04:33 /usr/bin/sh
BASH=/usr/bin/sh
SHELLOPTS=braceexpand:hashall:interactive-comments:posix
export ALLUSERSPROFILE="C:\\ProgramData"
This shows that the running shell process is observed by other MSYS2 processes as /usr/bin/sh, that it is a bash shell that sees its own argv[0] as /usr/bin/sh, that it is in POSIX mode (the trailing posix field in the value of SHELLOPTS), and that it exhibits POSIX style export -p output (that the variable is ALLUSERSPROFILE, and its value, are not important here).
Steps to reproduce 🕹
The difference is demonstrated by running gix fetch in the same repository created above to demonstrate the Git behavior. The output was:
PID PPID PGID WINPID TTY UID STIME COMMAND
317 1 317 16100 ? 197609 21:25:11 /usr/bin/bash
BASH=/usr/bin/bash
SHELLOPTS=braceexpand:hashall:interactive-comments
declare -x ALLUSERSPROFILE="C:\\ProgramData"
Error: An IO error occurred when talking to the server
Caused by:
failed to fill whole buffer
The relevant part is:
PID PPID PGID WINPID TTY UID STIME COMMAND
317 1 317 16100 ? 197609 21:25:11 /usr/bin/bash
BASH=/usr/bin/bash
SHELLOPTS=braceexpand:hashall:interactive-comments
declare -x ALLUSERSPROFILE="C:\\ProgramData"
This shows that the running shell process is observed by other MSYS2 processes as /usr/bin/bash, that it is a bash shell that sees its own argv[0] as /usr/bin/bash, that it is not in POSIX mode (no posix field in the value of SHELLOPTS), and that it accordingly does not exhibit POSIX-style export -p output (as above, the variable and value aren't affected by whether it's in POSIX mode, only which format it uses).
Thanks so much for this incredible research!
I have lost all faith for ever getting this right by myself 😅, and can only leave decisions on how to best tackle this to you. Personally I'd prefer correctness and execute through the shim by default if it makes anything better, and deal with performance problems later.
Personally I'd prefer correctness and execute through the shim by default if it makes anything better, and deal with performance problems later.
Yes, correctness is more important than performance here. But the shim is the cause of the incorrect behavior described in this issue. But not using the shim, unless other steps are taken, will cause a more severe form of incorrect behavior (https://github.com/GitoxideLabs/gitoxide/pull/1862#issuecomment-2692158831).
That the shim causes the issue described here where the shell is wrongly not in POSIX mode might be a bug in Git for Windows. That is, this entire issue may simply be a bug in Git for Windows, as it manifests in gitoxide's interaction with Git for Windows. If so, then it could be entirely fixed by changes to Git for Windows. I would consider that outcome to be ideal. I will look into that.
In contrast, some beneficial effects that we currently only get when we use the shim are needed to avoid problems that are more serious, and some of those problems are not due to bugs in other software.
The specific problem I encountered when not using the shim, summarized above and detailed in https://github.com/GitoxideLabs/gitoxide/pull/1862#issuecomment-2692158831, is hopefully a bug that can be fixed in MSYS2, though I am not confident that it is considered a bug. But even if so, configured shell commands, hook scripts, fixture scripts, and any other shell scripts generally need to have access to common Unix tools--the "standard library" of shell scripting--such as cat and rm. On Unix-like systems, this can generally be assumed, but not on Windows. Part of what the shim does is to customize the environment to make that happen.
So we should call the shim unless we can customize the environment sufficiently ourselves. Customizing the environment ourselves would likely improve performance, but that is not the main reason I am interested in eventually doing it. Rather, it would actually be more similar to what git does. In Git for Windows, git does not use any shim when it runs a custom command or hook. Instead, it customizes its environment.
My understanding of how and when git does that is incomplete. I believe this was originally implemented in Git for Windows through the git shim, which should not be confused with other shims Git for Windows provides such as the sh shim (that is really a shim for bash, per this issue) and the bash shim. But in https://github.com/git-for-windows/git/pull/2506 the non-shim git executable was enhanced to customize its own environment when run in a way that it detects is not through its shim. This is done by setup_windows_environment, which conditionally customizes PATH by calling the append_system_bin_dirs function.
Thanks for the clarification, I do usually have trouble to correctly digest everything in long write-ups, and using a the 'summary' feature feels dangerous, too.
In Git for Windows,
gitdoes not use any shim when it runs a custom command or hook. Instead, it customizes its environment.
That's perfect - if I understand correctly one could safely adopt this code, avoid the shim, and become independent of any bug-fix in Git for Windows, all while avoiding potential performance issues. I am probably missing something though, as it didn't sound quite so obvious when you mentioned it.
Yes, though I suspect something like this is what we should eventually do, there are several caveats and ways in which we would have to depart from what Git for Windows does or from how it does it:
-
The code of the Git for Windows
gitexecutable implements various POSIX operations for Windows inmingw.c. In contrast, gitoxide is not organized that way. I also do not advocate that gitoxide be organized that way.(
gitstarted out expecting a Unix-like environment and continues broadly to assume one, so Git for Windows adapts to that. But gitoxide has no corresponding initial design restriction. Furthermore, I think gitoxide's Windows support has always been a strength. For example, gitoxide was broadly usable at native speed on ARM64 Windows systems even before Git for Windows started making releases for them.)Because of this, I suspect there are some changes we would have to make, compared to what Git for Windows has in
setup_windows_environmentandappend_system_bin_dirs, even if we know Git for Windows is present and usable and even if we want to create as similar an environment as possible. -
We do not know Git for Windows is present and usable.
-
We may not want to create as similar an environment as possible. External commands may examine environment variables that Git for Windows sets, and assume based on them that Git for Windows is present and usable. So we should account for that if possible.
This overlaps with some of the considerations discussed under #1585 (comment) (which I have not forgotten about), such as the question of whether we should prepend the
git-coredirectory as Git for Windows does when we can find that directory. -
The code of those functions in Git for Windows can change along with other changes to Git for Windows, whereupon an approach we do that is closely based on it might not work properly. That's the same concern as this above point about implementing environment changes based on what Git for Windows
sh/bashshims do, but applied to the (presumably very similar) effect of whatgitdoes in mimicking its owngitshim:In particular, every version of Git for Windows has a chance to ship shims that work differently to account for changes in other parts of Git for Windows. In contrast, gitoxide has no such versioned coupling to the Git for Windows shims.
-
We actually shouldn't change our environment at all. That is,
gitcan change its environment because it is a program, not a library. Library crates in gitoxide should avoid changing the environment because this is not safe to do.While writing environment variables is actually synchronized on Windows (unlike most other operating systems), it is still "unsafe" in the broader sense that the program that uses
gix-*crates should decide what goes in its environment. So any changes that have to be made to the actual process environment should be opt-in and as much as possible should work properly without them, much as we do with signal handlers.I think this is not a deep problem because, for the most part,
gix-commandcan make the changes for the environments of subprocesses that are being created. But this is an important reason we may not be able to customize environments in the same way thatgitdoes.
So I don't want to attempt anything like this for #1868, which I expect to be ready to review for merging (possibly with its current code) once I have completed some more testing.
Thanks for elaborating.
- I think this is not a deep problem because, for the most part,
gix-commandcan make the changes for the environments of subprocesses that are being created. But this is an important reason we may not be able to customize environments in the same way thatgitdoes.
After reading this, I also think that maybe it would be easiest if the Git for Windows behaviour was a bug, and if that could be fixed instead.
Besides that, I agree that gix-command would be where such changes could be done to about-to-be spawned commands.
So I don't want to attempt anything like this for #1868, which I expect to be ready to review for merging (possibly with its current code) once I have completed some more testing.
I think so, too! Looking forward to merging it.
After reading this, I also think that maybe it would be easiest if the Git for Windows behaviour was a bug, and if that could be fixed instead.
Yes. I intend to pursue that before attempting any manual environment customization. Assuming the situation of the sh shim running bash in non-POSIX mode can be fixed as a bug in Git for Windows, then manual environment modification, if eventually done, would be more of a performance optimization.
In either case, this need not block #1862 (which is what I meant above when I said #1868).
Ahh..., I just looking for a git reimplementation on Windows without any POSIX(cygwin, msys2, ...etc).
But now it seems impossible (for me).