playwright [BUG] Killing webserver process should use SIGTERM on Linux, with SIGKILL as fallback

Context:

Playwright Version: 1.20.0-alpha-feb-23-2022
Operating System: Linux
Node.js version: v16.10.0
Browser: Chromium
Extra: Using docker-compose to launch webserver

Config Snippets

Playwright config:

const config: PlaywrightTestConfig = {
  // ...
  webServer: {
    command: 'docker-compose -f docker/docker-compose.yml up app-for-e2e',
    port: 3238,
    timeout: 10 * 1000,
    reuseExistingServer: !process.env.CI,
  }
};

docker-compose.yml:

app-for-e2e:
  # ...
  container_name: app-for-e2e
  platform: linux/amd64
  ports:
    - 3238:80

Describe the bug

We use a Docker container to launch our dev webserver and run Playwright tests against it. The docker-compose up command responds to SIGINT and SIGTERM by gracefully terminating the container(s) that it launched, but if it receives SIGKILL then it cannot terminate the containers, and they stay running. At the moment, the WebServer class does not provide any way to gracefully close the launched process, instead using the killAndWait function from processLauncher.ts, which sends SIGKILL to the process without ever trying SIGTERM.

The result is that after I run npx playwright test, the app-for-e2e container is still running, because Playwirght never attempted to send SIGTERM to the process and just sent SIGKILL instead. I then have to stop the app-for-e2e container manually, since I don't want it running and taking up system resources when I'm not using it.

Proposed solution

To solve this bug, I would suggest changing killProcess in processLauncher.ts to take a boolean argument gracefully, defaulting to false for backwards compatibility. If gracefully is true, send SIGTERM rather than SIGKILL on Unix-like platforms (Linux and MacOS), or on Windows, run taskkill without the /F flag. Add the same parameter to killProcessAndCleanup and killAndWait. Then change the this._killProcess() method in the WebServer class to look something like this:

this._killProcess = async () => {
  const _hasExited = async () => launchedProcess.exitCode != null;
  const killTimeout = this.config.timeout || 60 * 1000;
  kill(true);
  const cancellationToken = { canceled: false };
  const { timedOut } = (await Promise.race([
    raceAgainstTimeout(() => waitFor(_hasExited, 100, cancellationToken), killTimeout),
    this._processExitedPromise,
  ]));
  cancellationToken.canceled = true;
  if (timedOut)
    kill(false);
};

Additional Info

This is #12299 reopened, since #12299 was closed without a good reason for closing it. (The reasons given were "It looks like this issue only has a handful of upvotes, has not been touched recently and/or we lack sufficient feedback to act on it", none of which are good reasons to close a bug without discussion in the issue first. If a bug is going to be closed for any of those reasons, a comment to that effect should be made in the bug itself first. Closing bugs with no discussion like that is bad issue-management practice.)

Oct 20 '22 14:10 rmunn

Possibly related: https://github.com/microsoft/playwright/issues/18041

My current workaround is to manually detect if a process has been started by playwright, then use tree-kill to kill it in GlobalTeardown.

Oct 20 '22 14:10 thenbe

I'm reading docker docs and it seems like docker compose itself is sending SIGTERM to the containers to gracefully kill them: https://docs.docker.com/compose/faq/#:~:text=Compose%20stop%20attempts%20to%20stop,they%20receive%20the%20SIGTERM%20signal. It probably does the same thing when you send SIGTERM to docker compose, or it happens automatically if they are a part of the same process tree.

This is a known anti-pattern when reliably managing dependent process trees - it is prone to dangling and zombie processes. Any process can die at any moment, you can't rely on SIGTERM solely to process the cleanup code. I.e. we can support graceful termination, but they must ensure hard kill still works. It looks like a docker bug to me. Even with your proposed fix, clicking Ctrl+C twice would allow container processes to survive, there is no guarantee they will die.

Oct 20 '22 15:10 pavelfeldman

A quick note on Linux signal handling in case someone reading this is unfamiliar with it: processes are allowed to handle most signals however they want, and shutting down when you receive SIGTERM is only a (universally-followed) convention: you could handle SIGTERM and do nothing with it, though that would be considered a bug in nearly every case. But processes cannot handle SIGKILL. When a process is sent SIGKILL, the Linux kernel immediately kills the process, giving it no time to do any cleanup. Which is why the pattern on Linux is usually to send SIGTERM first (which by convention means "do any necessary cleanup, then shut yourself down"), then wait a configurable number of seconds before sending SIGKILL if the process hasn't terminated in a timely fashion.

So yes, docker-compose needs to have something in place to handle cases where the docker-compose process dies unexpectedly (or is sent a SIGKILL). But bugs in other software are no excuse for Playwright not doing the right thing, and sending SIGTERM first in order to allow webserver processes to do whatever cleanup they might find necessary. I experienced this issue with docker-compose, but there are many other scenarios where a webserver process might have some cleanup (deleting temporary files, etc.) to do, and sending SIGKILL first without ever attempting SIGTERM doesn't allow that cleanup to ever happen.

Oct 20 '22 16:10 rmunn

This should be reopened, since the initial fix was reverted.

Feb 06 '23 23:02 domvo

I'd reopen it, but I don't have permissions to do so on this repo. Any maintainers want to reopen this, or would you prefer I open a new issue linking to this one?

Feb 06 '23 23:02 rmunn

The fact that the webserver is killed instead of terminating gracefully seems to break any tools that rely on Node V8 coverage of the webserver process, since coverage data is written when the process terminates (but not when it's killed). This breaks code coverage using c8, which is important for us. I don't know why the fix for this issue was correctly implemented and then reverted. SIGKILL should never be the default.

Nov 10 '23 20:11 alexis-aquanty

any update?

Mar 22 '24 09:03 dangoodman

This is a quick comment to keep this issue open, since apparently the Playwright team closes issues without warning when they haven't had much recent activity.

Psst, Playwright team: it would be nice if you would drop a comment on stale issues first, say 14 days before closing them, saying "This issue has not had any recent activity and will be closed in 14 days". That way people wouldn't feel the need to add comments like this one, which is essentially spam but I don't know any other way to be sure this issue will be kept open. After all, it isn't fixed yet. If you're closing issues for not having recent activity, then "bump" comments like this are the only way, apparently, to make sure an issue that hasn't been fixed stays open.

Apr 25 '24 04:04 rmunn

Following this issue!

Aug 16 '24 10:08 swersk