router icon indicating copy to clipboard operation
router copied to clipboard

Slow router shutdown time

Open prasek opened this issue 3 years ago • 11 comments
trafficstars

Describe the bug

Router shutdown time is 10.2 seconds vs. Gateway 0.4 seconds in supergraph-demo-fed2

To Reproduce Steps to reproduce the behavior:

  1. git clone [email protected]:apollographql/supergraph-demo-fed2.git
  2. make demo-local-router
  3. see router shutdown time is 10.2 seconds, unlike the subgraphs which shutdown in < 0.4 seconds

image

Compare to Gateway

  1. make demo-local
  2. see gateway shutdown time is 0.3 seconds, like the subgraphs

image

Expected behavior Router shuts down as fast as Gateway, in <= 0.3 seconds.

Desktop (please complete the following information): OS: Mac OS 12.1.0 [64-bit]

prasek avatar Jan 12 '22 20:01 prasek

It seems like 10s is the default timeout for docker-compose stop.

Moreover the router is currently waiting for a Ctrl+C shutdown signal, which is probably not what docker sends. I'll have a look at what SIG docker sends and I'll get back to you!

o0Ignition0o avatar Jan 26 '22 10:01 o0Ignition0o

Quoting docker documentation:

Compose stop attempts to stop a container by sending a SIGTERM. It then waits for a default timeout of 10 seconds. After the timeout, a SIGKILL is sent to the container to forcefully kill it. If you are waiting for this timeout, it means that your containers aren’t shutting down when they receive the SIGTERM signal.

o0Ignition0o avatar Jan 26 '22 10:01 o0Ignition0o

@o0Ignition0o this is still taking 10s with the PR ☝️ so re-opening this for now.

Note the version is not present in the log output, so it's using the new image.

image

prasek avatar Jan 28 '22 20:01 prasek

Thanks for reopening it, I’ll check next week

o0Ignition0o avatar Jan 28 '22 20:01 o0Ignition0o

@o0Ignition0o looks like this did the trick

  • https://github.com/apollographql/supergraph-demo-fed2/pull/33

Same as what postgres is doing in their official Docker image.

prasek avatar Jan 28 '22 20:01 prasek

Postgres readme seems to indicate this is set this way in order to abort in flight requests. I however wonder what kind of request are still in flight at the time we’re trying to shut down the router. I’ll try to have a look regardless

o0Ignition0o avatar Jan 29 '22 08:01 o0Ignition0o

Since the default for STOPSIGNAL is SIGTERM, does this mean that the router is not terminating on merely SIGTERM?

abernix avatar Feb 04 '22 10:02 abernix

@abernix yes exactly. I've been adding STOPSIGNAL SIGINT to my custom router images, but for the pre-packaged router image it takes 10 seconds to shutdown.

Adding STOPSIGNAL SIGINT to the https://github.com/apollographql/router/blob/main/dockerfiles/Dockerfile.router would solve the issue for using the stock router docker image in supergraph-demo-fed2

Re-opening this for now as make demo-local-router is using the stock router image.

prasek avatar Aug 09 '22 17:08 prasek

Ok, that sounds like a bug then and something we should fix. Thanks for re-opening.

abernix avatar Aug 09 '22 17:08 abernix

so the default docker stop signal is SIGTERM (15), and when tested locally, the router stop immediately, so there must be an issue specific to running in a container

Geal avatar Aug 11 '22 13:08 Geal

the router's container closes with exit code 137 which indicates an OOMkill but I don't really see it with docker inspect

Geal avatar Aug 11 '22 14:08 Geal