tokio-uring
tokio-uring copied to clipboard
Hangs on `drop` of `Runtime` if `TcpListener` has an outstanding `accept` op
Since there is no accept
with a timeout I've used a standard timeout
operation to implement a graceful shutdown. It kind of worked, but lead to a hang at the end of tokio_uring::start
scope.
This is an interesting issue.
I think that we are going to need to cancel in-flight operations and wait for them to finish terminating. Unfortunately, for soundness reasons you cannot simply drop tasks exit the runtime like you can with normal tokio. Because these operations are in-flight with the kernel, we cannot exit until they complete because the kernel may be using buffers and other state that we have pinned within the runtime.
I'm gonna take some time today to think about this.
I've ran into the same problem which makes it impossible for me to run integration tests in CI since they never exit properly. I can't use the same hack @dpc used. I have a custom socket type based on Udp spawning the following task:
// Net loop
tokio_uring::spawn(async move {
let mut recv_buf = vec![0; 1024 * 1024];
loop {
// process_incomming doesn't spawn any tasks and net_loop_socket is a UdpSocket
tokio::select! {
buf = process_incomming(&net_loop_socket, &streams_clone, &accept_chan, std::mem::take(&mut recv_buf)) => {
recv_buf = buf;
}
_ = &mut shutdown_receiver => {
log::info!("Shutting down network loop");
// TODO shutdown all streams gracefully
break;
}
}
}
});
Within the process_incomming
function a call to UdpSocket
recv_from
is blocking at the beginning of the function:
async fn process_incomming(
socket: &Rc<UdpSocket>,
connections: &Rc<RefCell<HashMap<StreamKey, WeakUtpStream>>>,
accept_chan: &Rc<RefCell<Option<tokio::sync::oneshot::Sender<UtpStream>>>>,
recv_buf: Vec<u8>,
) -> Vec<u8> {
let (result, buf) = socket.recv_from(recv_buf).await;
/// snip...
}
Here is the socket drop implementation
fn drop(&mut self) {
// a tokio oneshot channel
println!("Droppping!");
let _ = self.shutdown_signal.take().unwrap().send(());
}
What I can see from the output of my tests is the following:
Dropping!
[2022-09-18T10:09:16Z TRACE mio::poll] deregistering event source from poller
I would expect to see Shutting down network loop
to be printed and that the tokio_uring::start
to exit without hanging. I tried to get rid of the oneshot
channel and instead use the JoinHandle
of the task directly to abort the spawned task but still end up with the same problem. @Noah-Kennedy I presume the underlying issue is the same here?