demikernel icon indicating copy to clipboard operation
demikernel copied to clipboard

[inetstack] `dead_socket_tx` Causes Unit Test Regressions to Fail

Open gatoWololo opened this issue 3 years ago • 3 comments

Running test_connect currently fails with:

Failed to terminate connection: TrySendError { kind: Disconnected }
thread 'protocols::tcp::tests::test_connect' panicked at 'Failed to terminate connection: TrySendError { kind: Disconnected }', src/protocols/tcp/established/background/mod.rs:58:14
stack backtrace:
   0: rust_begin_unwind
             at /rustc/ca82264ec7556a6011b9d3f1b2fd4c7cd0bc8ae2/library/std/src/panicking.rs:493:5
   1: core::panicking::panic_fmt
             at /rustc/ca82264ec7556a6011b9d3f1b2fd4c7cd0bc8ae2/library/core/src/panicking.rs:92:14
   2: core::result::unwrap_failed
             at /rustc/ca82264ec7556a6011b9d3f1b2fd4c7cd0bc8ae2/library/core/src/result.rs:1355:5
   3: core::result::Result<T,E>::expect
             at /home/gatowololo/.rustup/toolchains/nightly-2021-05-10-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/result.rs:997:23
   4: catnip::protocols::tcp::established::background::background::{{closure}}
             at ./src/protocols/tcp/established/background/mod.rs:56:9

This line:

        dead_socket_tx
            .unbounded_send(fd)
            .expect("Failed to terminate connection");

As the error message states, the unbound_send always fails because the receiver is disconnected. So we always panic here.

Looking at the code. It seems the channel pair is created here:

impl<RT: Runtime> Peer<RT> {
    pub fn new(rt: RT, arp: arp::Peer<RT>, file_table: FileTable) -> Self {
        let (tx, _rx) = mpsc::unbounded();
        let inner = Rc::new(RefCell::new(Inner::new(rt.clone(), arp, file_table, tx)));
        Self { inner }
    }
...
}

As you can see the receiver is immediately dropped. So I don't see how this code is every supposed to not fail. Or what is the point of it since it doesn't seem to do anything?

gatoWololo avatar Jun 21 '21 19:06 gatoWololo

Ah, I think I probably just forgot to implement this logic...

So, I think the idea was to have the closer coroutine maintain a reference to this channel and send its FD on the channel when TCP closes the connection. then, the Peer could have a coroutine that reads off this channel and cleans up all of its associated state. For example, the Peer would remove the socket from its sockets map, the established (or whichever one is relevant) map, and return its FD to the file table.

sujayakar avatar Jun 22 '21 06:06 sujayakar

The motivation for having this coroutine at all is that the per-socket coroutines can't have access to the Peer's maps, so they can't clean up their own state directly.

sujayakar avatar Jun 22 '21 06:06 sujayakar

I think we need to restructure most of this to allow things to cleanup properly. And rename TcpPeer to something that makes sense.

BrianZill avatar Mar 29 '22 06:03 BrianZill