libs-team
libs-team copied to clipboard
Update Windows exe search order (phase 2)
Proposal
Problem statement
On Windows, the Command program search uses slightly different rules depending on if the environment is inherited or not. To cut a long store short, this is for historical reasons. Way back in pre-1.0 times the intent was to follow our Unix code in searching the child's PATH. However, after several refactorings this became super buggy and only really worked if Command::env was used. In other cases it (unintentionally) fell back to the standard CreateProcess rules.
Awhile back it was decided to remove the current directory from the Windows Command search path. Which I did. At the time I was a bit worried it would affect people. But as it turned out that didn't appear to have that much of an impact. Or at least I've not heard of anyone having serious issues with it.
I did however preserve some of the buggy env behaviour because, I was worried about making too many changes at once.. However, I do think it needs fixing somehow
Motivating examples or use cases
Assuming that there is an app called hello.exe alongside the current application and also a different hello.exe in PATH:
// Spawns `hello.exe` in the applications' directory
Command::new("hello").spawn()
// Spawns `hello.exe` from PATH
Command::new("hello").env("a", "b").spawn()
Background
Windows CreateProcess search order
When using CreateProcess and not setting lpApplicationName, it will attempt to discover the application in the following places (and in this order):
- the parent process' directory
- the current directory for the parent process.
- the system directories
- the parent process'
PATH
This is the order (or similar) used by most Windows applications and runtimes.
Rust's Unix search order
- the child process'
PATH
Note: Rather than using execvpe, Rust sets the environment after forking and then uses execvp. See https://github.com/rust-lang/rust/blob/ed49386d3aa3a445a9889707fd405df01723eced/library/std/src/sys/pal/unix/process/process_unix.rs#L395
Rust's Windows search order
- the child process'
PATHbut only if the child environment is not inherited. - the parent process' directory
- the system directories
- the parent process'
PATH
Obviously this leads to some inconsistencies depending on whether Command::env is used or not.
It was originally intended we just do 1.; so this search order was somewhat accidental.
Solution sketch
There is a tension here between being consistent cross-platform and being consistent with non-Rust applications on Windows. We'd also prefer not break existing applications.
Trying to keep everyone happy is difficult, if not impossible but I think we can do better than we currently are. With that in mind, I would like the search order to be consistently:
- the parent process' directory
- the system directories
- the child process'
PATHbut only if the child environment is not inherited. - the parent process'
PATH
This is more or less the same as now except that the parent process' directory and system directories are consistently searched first.
I'd love to only check either the parent's or child's PATH, not both, but I worry that would be too breaking.
Alternatives
- Keep the status quo
- Be more consistent with other Windows applications and don't search the child
PATHat all. - Be more consistent with other Rust platforms and don't search the parent
PATH. - Be consistent with neither and only search the parent's
PATH(i.e. not the application or system directories). - Have a new API for resolving applications, that allows at least some degree of control on how the search is performed. Though that would still need the default behaviour figured out.
Links and related work
- https://github.com/rust-lang/rust/issues/15149 (which also added this test in std: https://github.com/rust-lang/rust/blob/3b022d8ceea570db9730be34d964f0cc663a567f/library/std/tests/process_spawning.rs)
- Reduced Windows Command search path
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
- We think this problem seems worth solving, and the standard library might be the right place to solve it.
- We think that this probably doesn't belong in the standard library.
Second, if there's a concrete solution:
- We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
- We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
For the status quo you wrote
the current application's parent directory
but then for the proposal this becomes
the application's directory
That sounds like it's not the same thing? However, you're not discussing changing that part of the behavior either, so maybe this is meant to refer to the same thing in both cases? I am confused.
GitHub actions has (WSL) bash.exe in C:\Windows\System32 (the system directory). However, some people want to use (git) bash.exe in C:\Program Files\Git\bin. Currently there is no way to override it (unless you know the env hack, which isn't great).
Oh, that may explain some extremely painful and confusing behavior I experienced on GHA recently. Having the system dirs overwrite PATH is certainly surprising and seems to be inconsistent both with Unix behavior and with the normal Windows behavior, so this is definitely something that should be changed.
There are competing demands in fixing this to be more predictable. Some people want this to be as close to the "normal" Windows behaviour as possible, others strongly favour cross-platform consistency.
It would be good to discuss where on the spectrum of "work like Unix" vs "work like normal Windows programs" this proposal lies.
That sounds like it's not the same thing? However, you're not discussing changing that part of the behavior either, so maybe this is meant to refer to the same thing in both cases? I am confused.
I've reworded to hopefully avoid confusion.
It would be good to discuss where on the spectrum of "work like Unix" vs "work like normal Windows programs" this proposal lies.
I've done a small edit to clarify that comment a bit and I'll attempt to expand upon the point later when I have time to write something more succinct. But here's the long version:
- Unix essentially creates a new process, sets the new environment variables and then searches
PATH(and onlyPATH) for the executable to load. - Windows shells (i.e. cmd and powershell) use the currently set
PATHto find executables. This is effectively the same as Linux, except that cmd by default will also run applications in the current directory (though this is configurable and powershell does not do this). - The Windows API
CreateProcesshas its own way of finding a executable if it's not given explicitly (scroll down that page to "1. The directory from..."), Most languages use this because they're relatively thin wrappers around this API.
I'll repost the CreateProcess behaviour here. To be clear the current directory and environment variables are that of the parent process:
- The directory from which the application loaded.
- The current directory for the parent process.
- The 32-bit Windows system directory.
- The 16-bit Windows system directory.
- The Windows directory.
- The directories that are listed in the PATH environment variable.
The Unix behaviour is:
- The directories that are listed in the child process' PATH environment variable
If Command::env is not used then Rust currently uses a modified version of the CreateProcess behaviour. It removes 2. for security reasons and it also remove the 16-bit system directory. So it's:
- The directory from which the application loaded.
- The 32-bit Windows system directory.
- The Windows directory.
- The directories that are listed in the parent process' PATH environment variable.
If Command::env is used then it tries to mimic the Unix behaviour then falls back to the above if the application is not found:
- The directories that are listed in the child process' PATH environment variable
- The directory from which the application loaded.
- The 32-bit Windows system directory.
- The Windows directory.
- The directories that are listed in the parent process' PATH environment variable.
The suggested change simplifies this to always doing this:
- The directory from which the application loaded.
- The directories that are listed in the child process' PATH environment variable.
- The directories that are listed in the parent process' PATH environment variable.
Due to unifying the two behaviours, this leads to being slightly less Unix-like in the case where Command::env is used but slightly more Unix-like otherwise. To be honest, I would also prefer to pick only one out of 2. and 3. but I suspect that'd break someone.
Thanks for the clarifications!
The directory from which the application loaded.
Is that what you called "the parent process' directory" above? I thought "the parent process' directory" referred to the current working directory, but "from which the application loaded" sounds more like the directory containing current_exe()?
The directories that are listed in the PATH environment variable.
Oh wow, so this is last normally on Windows? Seems like Windows users could be quite surprised then about Rust giving PATH a lot higher priority. Or maybe it's less "higher priority" and more "skipping the system directories".
directory from which the application loaded
This is the directory in which the parent process executable resides, correct? Not the current working directory.
I think the simplification you recommend sounds good. I presume we're allowed to deduplicate (2) and (3) when applicable?
Thanks for the clarifications!
The directory from which the application loaded.
Is that what you called "the parent process' directory" above? I thought "the parent process' directory" referred to the current working directory, but "from which the application loaded" sounds more like the directory containing
current_exe()?
Yes, that's right. By "current directory" I meant the current working directory and by "application's directory" I meant the directory containing the application as given by current_exe(),
The directories that are listed in the PATH environment variable.
Oh wow, so this is last normally on Windows? Seems like Windows users could be quite surprised then about Rust giving PATH a lot higher priority. Or maybe it's less "higher priority" and more "skipping the system directories".
The system directories are in the PATH and usually before the user's paths. So from a practical perspective it makes little difference unless the user has modified their PATH to remove them.
And as noted, shells don't do that so there's already inconsistency.
directory from which the application loaded
This is the directory in which the parent process executable resides, correct? Not the current working directory.
I think the simplification you recommend sounds good. I presume we're allowed to deduplicate (2) and (3) when applicable?
Sure but that's a libs optimization, rather than a libs-api question.
And as noted, shells don't do that so there's already inconsistency.
Yeah seems to be quite the mess.
From a Unix perspective, the most surprising part to me is "The directory from which the application loaded". I guess that's just a sufficiently common pattern on Windows that we have to support it?
Yes, I believe so. Windows applications are generally packaged in their own directory rather than being in a soup of other applications. So, at least traditionally, giving high priority and trust to the application's load directory is expected.