seastar
seastar copied to clipboard
Warn if RPC server handlers are not exiting fast enough during stopping time
When rpc::server stops it kicks accept abort, stops connections and waits for all of them to resolve relevant stopping futures. One of the stopping promises is that the _reply_gate is closed. Inside this gate server runs responses to messages, but also the gate enter-exit spans over the message handler callback to resolve.
While all the kicked-and-waited stuff is internal to RPC, handlers are external and seastar has no way to ensure they finish in a timely manner, it can only assume that. The PR is supposed to facilitate debugging this assumption by counting handler callback entrances and exits (i.e. future resolutions) and warn in server stop doesn't finish within a minute printing the number of currently not-exited handlers.
Stuck server.stop() with zero running handlers is still an internal issue to result.
@scylladb/seastar-maint , review ping
@scylladb/scylla-maint , review ping
upd:
- print registered handler with non-zero use_gate counter to give a glue to which verbs are stuck
upd:
- decipher printed stuck ids by putting commas between numbers :facepalm:
@scylladb/seastar-maint , merge ping
I don't know if 1 minute is a universally good metric. In debug mode things can take longer. Running scylladb's parallel aggregation can take more.