Closing Quic session is inconsistent
Describe the bug
When closing a Zenoh session, it is expected that the socket is available after the session.close() call returns. This should be observed independently of which Link protocol is used by the session. This is not the case sometimes when using Quic.
Testing locally has shown this to be true on TCP, UDP and TLS. However, sequentially closing and opening a router/peer Quic session leads to panic due to the socket being unavailable. This only occurs when the session in question has been used (i.e. a client or peer has established a connection on that socket), this case is therefore not covered by this test which only opens one router session and alternates open/close of client sessions that connect to it.
To reproduce
#[tokio::main]
async fn main() {
{...Initializations and config...}
println!("Opening Router Session...");
let router = zenoh::open(router_config.clone())
.res_async()
.await
.unwrap();
println!("Opening Client Session...");
let client = zenoh::open(client_config.clone())
.res_async()
.await
.unwrap();
println!("Closing Client Session...");
client.close().res_async().await.unwrap();
println!("Closing Router Session...");
router.close().res_async().await.unwrap();
println!("Reopening Router Session...");
// This will panic most of the time if the Transport is using Quic unicast Link
// It does not panic when using TLS, TCP or UDP links (tested locally)
let s2 = zenoh::open(router_config.clone())
.res_async()
.await
.unwrap();
}
System info
MacBook Air, Mac OS, Apple Silicon M2 Zenoh 0.11.0-dev
Can you try adding a task::sleep(Duration:from_secs(1) after the router.close()?
It's not uncommon that the API returns but the system/library still takes a bit to complete the cleanup under the hood.
Can you try adding a
task::sleep(Duration:from_secs(1)after therouter.close()? It's not uncommon that the API returns but the system/library still takes a bit to complete the cleanup under the hood.
Even with a sleep, the time it takes to free the socket is inconsistent and can take more than five seconds in some unlucky runs.
This test verifies that links can be open/closed in a loop. I.e. very similar to the scenario you are describing. Do you observe the same behaviour if you run the test manually?
This test verifies that links can be open/closed in a loop. I.e. very similar to the scenario you are describing. Do you observe the same behaviour if you run the test manually?
Adding and removing the listener, or opening and closing the listener's session alone is not enough to trigger this issue. In my tests I am able to replicate this only when some other session is created and connected to the listener endpoint, as demonstrated in the example I gave above. EDIT: I realize I didn't give the configs in my example. The router is the listener, and the client connects to it.