graceful shutdown hang
Expected Behavior
Actual Behavior
2023-09-19T07:59:08.341593Z INFO poem::server: poem-1.3.57/src/server.rs:140 initiate graceful shutdown name="http handler Query" timeout_in_seconds=1.0
2023-09-19T07:59:08.342161Z INFO poem::server: poem-1.3.57/src/server.rs:190 wait for all connections to close. name="http h
andler Query"
Steps to Reproduce the Problem
Let's look at the codes:
https://github.com/poem-web/poem/blob/51d6d8d934aecf093e8044d9ba78e05349473b4c/poem/src/server.rs#L135-L192
If a signal is received, the loop will break.
But if the alive_connections is not zero, we will wait for notify but notify.notify_waiters(); will not be called anymore.
Specifications
- Version: latest
- Platform:
- Subsystem:
Seconding this, this is extremely frustrating as it also ignores any timeout functionality.
The code, you're referring to, doesn't look like a reason.
Yes, loop is stopped once signal is received.
But any accepted connection is spawned in separate task (tokio::spawn), and there is no any async calls before spawn, thus select won't prevent sync code to be executed. So this part will be called anyway (just in background by spawned task):
if alive_connections.fetch_sub(1, Ordering::Acquire) == 1 {
// We have to notify only if there is a registered waiter on shutdown
notify.notify_waiters();
}
And new connections won't be accepted, since loop is stopped. It also uses timeout_token - so once signal is received, all spawned connections will be dropped within the passed timeout to run_with_graceful_shutdown.
Do you use the timeout? Because otherwise connection may just be alive.
However I see possible race condition 🤔
-
notify_waiters()will notify only already registered waiters
Thus if it's called after alive_connections.load(Ordering::Acquire) > 0, but before notify.notified() - it might be never notified.
But we also can't use notify_one (as is) which stores state that it's notified. Because alive_connections may reach 0 before shutdown and go up again (i.e there is no connection, but server is alive). It was described here
I filled a new PR to address this scenario: #780 and another one also related to graceful shutdown #779