reproc icon indicating copy to clipboard operation
reproc copied to clipboard

Calling GenerateConsoleCtrlEvent in process_terminate creates zombie

Open ghost opened this issue 6 years ago • 15 comments

This is likely related to an old terminal issue, reported on GH here:

  • https://github.com/microsoft/terminal/issues/346
  • https://github.com/microsoft/terminal/issues/335

My setup is Windows 10, Visual Studio 2019 (16.1.5) and problem can be observed when running background.cpp reproc++ sample under VS. The observable results is that console window cannot be closed (Visual Studio Debugger Console cannot be closed if there are any child processes left, and the zombie process can't be stopped, can only be forcefully killed).

For me this doesn't repro in cmd.exe but does reproduce in Visual Studio Debugger Console as well as cmd replacements like @malxau's yori. I suspect that this is caused by a race between process->running set to false (i.e. child process terminating by itself) and platform specific process_terminate() being called (which in turn calls GenerateConsoleCtrlEvent()). In regular terminal I get lucky, in debugger console and yori I don't.

I tried a simple workaround by calling FreeConsole()/AttachConsole(child)/SetConsoleCtrlHandler(NULL, TRUE) before the GenerateConsoleCtrlEvent() call but that makes it impossible for the parent process to reattach to its console later on since the first FreeConsole() drops console's refcount to 0 an causes its release.

A dummy process would have to be spawned to hold to the parent console while it generates CTRL_BREAK_EVENT on the child but I haven't tried that yet and I'm not 100% convinced this is a correct solution to this problem.

ghost avatar Jul 14 '19 10:07 ghost

I don't immediately see how the race condition you describe would work. running is only changed in reproc_wait which is distinct from reproc_terminate.

If https://github.com/microsoft/terminal/issues/335 is the problem (which looks very plausible to me), ideally GenerateConsoleCtrlEvent would be fixed instead of working around the problem in reproc. I'd prefer to not start spawning dummy processes to keep reproc as simple as possible. I'd love to look into fixing GenerateConsoleCtrlEvent myself but unfortunately I'm swamped with other work for now.

I'm also open to other simple ways of implementing reproc_terminate on Windows but as far as I remember I didn't find any other satisfactory way of implementing this when I last looked into it.

DaanDeMeyer avatar Jul 14 '19 11:07 DaanDeMeyer

You're obviously right, process->running is only cleared in wait. I assumed there's a timing issue causing it to be called when running in cmd but not in VS debugger console but debugged some more and it's not the case. So I have no hypothesis as to why there's a difference in behavior. Consequently GenerateConsoleCtrlEvent bug may or may not be the root cause.

The behavior remains though - in certain situations background.cpp sample creates zombie process that causes console host to hang and become unresponsive to close button. :S It seems that the simplest workaround (at least the simplest I've found so far) is to call process::stop with cleanup::wait as the first cleanup strategy. Whether this is accidental or a proper workaround, I don't know yet. As it stands cleanup::terminate is unfortunately unreliable on Windows.

I'll keep looking into it. I want to use reproc in my project but this prevents me from doing so.

ghost avatar Jul 17 '19 18:07 ghost

Have you looked what happens when using cleanup::terminate?

DaanDeMeyer avatar Jul 17 '19 18:07 DaanDeMeyer

Dominik, note one difference between Yori and CMD is that Yori will call GenerateConsoleCtrlEvent, wait for 50ms to see if processes die, then call TerminateProcess. The background sample appears to just launch a child and wait, so if it's not processing console events in a timely fashion, I'd expect Yori to kill it and leave behind its child process. Is this what you're seeing? reproc doesn't look to use SetConsoleCtrlHandler anywhere, so it's depending on default Ctrl+C handling being enabled when it was launched; I tried to do this, but it's another way to end up in this state.

I don't know what the VSDebugger is doing, but leaving Ctrl+C handling off or forcibly terminating both seem plausible.

malxau avatar Jul 17 '19 21:07 malxau

I'm wondering why the execution environment makes a difference in the behaviour. Theoretically, the same reproc code should be executed regardless of the execution environment. Does the execution environment (cmd, yori, vs debugger) affect the behaviour of Windows API calls in any way (specifically those related to child processes)?

DaanDeMeyer avatar Jul 17 '19 22:07 DaanDeMeyer

Yes, it can. The case I was referring to above is that SetConsoleCtrlHandler defines the handling of Ctrl+C, and that behavior is propagated to child processes. They can call the API to change it, but doing nothing ends up getting behavior defined by their environment. The handling of Ctrl+C by the console also changes depending on the current console input mode, which is configured by SetConsoleMode, which is also propagated to child processes. Worse, the console is a shared resource, so a child can modify the environment of its parent.

In theory everything I just said is specifically for Ctrl+C, not Ctrl+Break. And one goofy thing Yori does as a result is observe Ctrl+C, and generate Ctrl+Break.

I think I'll need to compile some things and try this out myself. The only thing that's bothering me is although I can have screwed it up, I can't have screwed up the VS debugger too.

malxau avatar Jul 18 '19 00:07 malxau

reproc only generates Ctrl+Break so as far as I understand it, only the child process should be able to ignore it by calling SetConsoleCtrlHandler. @ddalek, Does cleanup::terminate actually stop the child process? If that's the case, we're sure the default handler is called and the problem might be somewhere else.

Could it maybe have something to do with handle inheritance? Although I specifically took care to not leak handles to child processes in reproc.

DaanDeMeyer avatar Jul 18 '19 06:07 DaanDeMeyer

@ddalek were you launching a GUI process by any chance? (link /dump can show the subsystem from the PE header.)

malxau avatar Jul 20 '19 19:07 malxau

All processes are console ones (3 subsystem windows CUI as reported by dumpbin).

I looked at the state of various processes when running the sample app and it looks very much like what's described in terminal bugs.

Console application with background.cpp sample code spawns testchild process (both are console subsystem). Child process has PID 10556

https://i.imgur.com/gx6EpV0.png

Child gets closed via GenerateConsoleCtrlEvent() but conhost still holds a handle to that process (overlapping console window at the top shows the end of the console output; VS console adds pause at the end of debug run and Press any key prompt is clearly not there).

https://i.imgur.com/Zryb6Db.png

If I forcibly close this handle in conhost process, VS debug console runs to completion.

https://i.imgur.com/7yFcSHN.png

So unfortunately it is a problem with conhost. I'm still not sure why this bug does not reproduce in vanilla console but does under VSDC and yori.

ghost avatar Jul 21 '19 10:07 ghost

I'm sure you're right that it's a conhost bug, and you're probably right that it's this conhost bug, but I'm just not seeing how the conditions are satisfied right now.

If:

  • reproc didn't use CREATE_NEW_PROCESS_GROUP; or
  • the process wasn't a console process; or
  • the process was launched on a different console (eg. CREATE_NEW_CONSOLE)

then this bug would mean GenerateConsoleCtrlEvent would try to associate the child with the console reproc is on. In the latter two cases, this is incorrect and fouls everything up. But if those three conditions are false, it should already be on this console, part of a process group, and the "normal" case should exist.

It's likely that I'm just missing something here, and that there's another condition that gives rise to this bug or for some other reason one of the above conditions becomes true, but if that's the case, I'm not currently understanding why.

malxau avatar Jul 23 '19 16:07 malxau

Ah, I see what you're saying. I'll tackle that overcomplicated build setup and try debugging conhost when I have the time then.

ghost avatar Jul 23 '19 17:07 ghost

Building it is straightforward; AFAIK it's just:

msbuild /p:Configuration={Debug,Release} /p:Platform=x64 <path to sln file>

This will spit out conhost, even though it will error out trying to generate Cascadia.

Unfortunately when I did this my windbg couldn't make sense of the symbol file; I think I need to upgrade windbg to understand the pdb format it's currently using.

malxau avatar Jul 23 '19 18:07 malxau

According to this Stackoverflow answer, Python uses the same mechanisms that reproc does for stopping processes on Windows. I looked around and unfortunately CTRL+Break seems to be the only way of gracefully stopping console processes on Windows.

I don't think there's anything I can do in reproc to work around this bug except recommending to use reproc_kill instead of reproc_terminate, but that has obvious issues as well in that cleanup cannot be performed by the child process.

DaanDeMeyer avatar Aug 12 '19 17:08 DaanDeMeyer

I experienced the same problem with Visual Studio Debugger Console. A simple workaround is to force Visual Studio to use the vanilla cmd.exe as a debugging console, by changing the following setting:

Tools > Options > Debugging > General > Automatically close the console when debugging stops

Tarquiscani avatar Jun 03 '21 11:06 Tarquiscani