tokio-uring icon indicating copy to clipboard operation
tokio-uring copied to clipboard

Hangs on `drop` of `Runtime` if `TcpListener` has an outstanding `accept` op

Open dpc opened this issue 1 year ago • 2 comments

Since there is no accept with a timeout I've used a standard timeout operation to implement a graceful shutdown. It kind of worked, but lead to a hang at the end of tokio_uring::start scope.

dpc avatar Aug 06 '22 21:08 dpc

This is an interesting issue.

I think that we are going to need to cancel in-flight operations and wait for them to finish terminating. Unfortunately, for soundness reasons you cannot simply drop tasks exit the runtime like you can with normal tokio. Because these operations are in-flight with the kernel, we cannot exit until they complete because the kernel may be using buffers and other state that we have pinned within the runtime.

I'm gonna take some time today to think about this.

Noah-Kennedy avatar Sep 01 '22 20:09 Noah-Kennedy

I've ran into the same problem which makes it impossible for me to run integration tests in CI since they never exit properly. I can't use the same hack @dpc used. I have a custom socket type based on Udp spawning the following task:

       // Net loop
        tokio_uring::spawn(async move {
            let mut recv_buf = vec![0; 1024 * 1024];
            loop {
                // process_incomming doesn't spawn any tasks and net_loop_socket is a UdpSocket
                tokio::select! {
                    buf = process_incomming(&net_loop_socket, &streams_clone, &accept_chan, std::mem::take(&mut recv_buf)) => {
                           recv_buf = buf;
                        }
                    _ = &mut shutdown_receiver =>  {
                        log::info!("Shutting down network loop");
                        // TODO shutdown all streams gracefully
                        break;
                    }
                }
            }
        });

Within the process_incomming function a call to UdpSocket recv_from is blocking at the beginning of the function:

async fn process_incomming(
    socket: &Rc<UdpSocket>,
    connections: &Rc<RefCell<HashMap<StreamKey, WeakUtpStream>>>,
    accept_chan: &Rc<RefCell<Option<tokio::sync::oneshot::Sender<UtpStream>>>>,
    recv_buf: Vec<u8>,
) -> Vec<u8> {
    let (result, buf) = socket.recv_from(recv_buf).await;
   /// snip...
}

Here is the socket drop implementation

fn drop(&mut self) {
        // a tokio oneshot channel
        println!("Droppping!");
        let _ = self.shutdown_signal.take().unwrap().send(());
}

What I can see from the output of my tests is the following:

Dropping!
[2022-09-18T10:09:16Z TRACE mio::poll] deregistering event source from poller

I would expect to see Shutting down network loop to be printed and that the tokio_uring::start to exit without hanging. I tried to get rid of the oneshot channel and instead use the JoinHandle of the task directly to abort the spawned task but still end up with the same problem. @Noah-Kennedy I presume the underlying issue is the same here?

Nehliin avatar Sep 18 '22 10:09 Nehliin