router icon indicating copy to clipboard operation
router copied to clipboard

Hot reload downtime under load

Open utay opened this issue 1 year ago • 2 comments

Describe the bug

In production a router process serves hundreds of requests per second; when it hot reloads because the supergraph has been refreshed, a few requests get 502s from the load balancer because it got connection refused on the router.

To Reproduce

I've been able to reproduce pretty consistently:

  • Start the router with hot reload enabled
  • Run any load testing tool such as ab -n 1000000 -c 20 http://127.0.0.1:4000/graphql
  • Modify the supergraph to make the router reload
  • :boom: Right around "reload complete", at least one request will fail with connection refused

Expected behavior

I'd expect 0 downtime as per the docs.

Additional context

  • Router 1.45.0

utay avatar May 16 '24 13:05 utay

hi, thank you for the report. This looks like an issue we have seen elsewhere, we'll investigate and get back to you

Geal avatar May 27 '24 08:05 Geal

This might be related to https://github.com/apollographql/router/pull/5235, which should land reasonably soon. If you wanted to try with that PR included, that might be a worthwhile try. 😄

abernix avatar Jun 10 '24 09:06 abernix