poem icon indicating copy to clipboard operation
poem copied to clipboard

graceful shutdown hang

Open sundy-li opened this issue 2 years ago • 3 comments

Expected Behavior

Actual Behavior

2023-09-19T07:59:08.341593Z  INFO poem::server: poem-1.3.57/src/server.rs:140 initiate graceful shutdown name="http handler Query" timeout_in_seconds=1.0
2023-09-19T07:59:08.342161Z  INFO poem::server: poem-1.3.57/src/server.rs:190 wait for all connections to close. name="http h
andler Query"

Steps to Reproduce the Problem

Let's look at the codes:

https://github.com/poem-web/poem/blob/51d6d8d934aecf093e8044d9ba78e05349473b4c/poem/src/server.rs#L135-L192

If a signal is received, the loop will break.

But if the alive_connections is not zero, we will wait for notify but notify.notify_waiters(); will not be called anymore.

Specifications

  • Version: latest
  • Platform:
  • Subsystem:

sundy-li avatar Sep 19 '23 00:09 sundy-li

Seconding this, this is extremely frustrating as it also ignores any timeout functionality.

zkldi avatar Dec 21 '23 20:12 zkldi

The code, you're referring to, doesn't look like a reason. Yes, loop is stopped once signal is received.

But any accepted connection is spawned in separate task (tokio::spawn), and there is no any async calls before spawn, thus select won't prevent sync code to be executed. So this part will be called anyway (just in background by spawned task):

if alive_connections.fetch_sub(1, Ordering::Acquire) == 1 {
    // We have to notify only if there is a registered waiter on shutdown
    notify.notify_waiters();
}

And new connections won't be accepted, since loop is stopped. It also uses timeout_token - so once signal is received, all spawned connections will be dropped within the passed timeout to run_with_graceful_shutdown.

Do you use the timeout? Because otherwise connection may just be alive.

DDtKey avatar Mar 25 '24 17:03 DDtKey

However I see possible race condition 🤔

  • notify_waiters() will notify only already registered waiters

Thus if it's called after alive_connections.load(Ordering::Acquire) > 0, but before notify.notified() - it might be never notified.

But we also can't use notify_one (as is) which stores state that it's notified. Because alive_connections may reach 0 before shutdown and go up again (i.e there is no connection, but server is alive). It was described here

I filled a new PR to address this scenario: #780 and another one also related to graceful shutdown #779

DDtKey avatar Mar 25 '24 18:03 DDtKey