volta fix: Prevent fork bomb on Windows

Fix #1741

Before this patch, volta-shim(and its aliases) spawns itself in some cases causing recursive call (fork bomb). This is a critical issue but only occurs on Windows.

May 23 '24 05:05 chawyehsu

Thanks for tackling this! Can you comment on the specific trigger condition and the mechanics of how it recurses? Without a good understanding of the overall root cause of the recursion, it's difficult to be confident that we're completely resolving the issue.

We're already working around some of the oddities of Command on Windows a bit by calling cmd /C <tool> instead of <tool> directly (see the behavior of the create_command function). I have a hypothesis that the underlying issue is when the Volta directory itself is the CWD for the command (which I think would be the case when you launch it from the file explorer). So even though we remove the Volta directory from the Path, it's still the "local" executable and is first place Windows looks to resolve the name.

If that's the case, we may be able to resolve this in a less intrusive way by detecting the CWD is one of the Volta dirs. I worry a little about the extra copying / allocating needed to hold onto the command, args, and env vars through the process. It's a relatively small impact, but this block is on the hot path for every call to any of the shims, so it's one of the most performance-sensitive areas of the app.

May 23 '24 06:05 charlespierce

One thought, if the above is accurate for how this recursion starts, would be to check on Windows right before executing if the CWD is one of the Volta directories. If it is, we can do the absolute path resolution and then copy the values (since we’ll need to create a new Command at that point regardless). Helpfully, Command itself has methods to get the root command, the arguments, and the environment variables, so we don’t need to copy ahead of time, we can read them from the already-built command to build a new one (see e.g. get_args)

May 23 '24 06:05 charlespierce

We're already working around some of the oddities of Command on Windows a bit by calling cmd /C <tool> instead of <tool> directly (see the behavior of the create_command function). I have a hypothesis that the underlying issue is when the Volta directory itself is the CWD for the command (which I think would be the case when you launch it from the file explorer). So even though we remove the Volta directory from the Path, it's still the "local" executable and is first place Windows looks to resolve the name.

I see that it had tried to remove the Volta directory from the Path via System::path(), but that's not enough. The udnerlying CreateProcessW searches hard-coded locations before the PATH env:

If the file name does not contain a directory path, the system searches for the executable file in the following sequence:

The directory from which the application loaded.

The current directory for the parent process.

The 32-bit Windows system directory. Use the GetSystemDirectory function to get the path of this directory.

The 16-bit Windows system directory. There is no function that obtains the path of this directory, but it is searched. The name of this directory is System.

The Windows directory. Use the GetWindowsDirectory function to get the path of this directory.

The directories that are listed in the PATH environment variable. Note that this function does not search the per-application path specified by the App Paths registry key. To include this per-application path in the search sequence, use the ShellExecute function.

That's the root cause.

we may be able to resolve this in a less intrusive way by detecting the CWD is one of the Volta dirs.

I get your thought here but it does not really help imo by just detecting CWD. It should only find the right executable by only checking from the PATH environment variable we give. (and this is why which crate was introduced)

May 23 '24 06:05 chawyehsu

Btw, I don't quite understand why the infinite recursive call is allowed. I believe there should be a limit of max recursive depth, say 10, and once it exceeds it should halt. This should help to resolve from another perspective.

May 24 '24 02:05 chawyehsu

Doing some spelunking in our issues (while investigating some other, ostensibly unrelated, Windows issues), and I've come around on this approach. I actually think we might be able to do even better and use the behavior of which to completely skip needing to add cmd.exe /C entirely. It's been ~5 years, but looking back at some of our issues (notably #577), I think the reason we use cmd.exe /C currently is twofold:

To work around issues with Rust's Command implementation around calling .cmd and .bat files without extensions.
To prevent fork bombs like this when calling a package that happens to not exist

Bypassing cmd.exe entirely, and using which directly to resolve the executable would solve both of those problems - which already handles detecting the correct file extension and returning it if it's not found.

Additionally, this would solve the linked issue around error messages, because if we can't locate the executable with which, then we can know (even without needing to try to spawn the process) that it won't work and can show a helpful error message instead if a generic cmd.exe node is not a recognized command... error.

It also (I think) would solve argument parsing issues like those mentioned in #1791, so this may be an altogether cleaner path forward.

Thanks for suggesting and pushing on this @chawyehsu!

Oct 03 '24 18:10 charlespierce

Thanks for the informative response, would love to see improvement to this PR to make it a general solution to linked issues. @charlespierce

Oct 10 '24 03:10 chawyehsu