ockam
ockam copied to clipboard
Dangling channel receiver in Router
emerged while resolving #3054
- Reproduce
info!("*********************** before create detached ****************");
let mut detached = ctx.new_detached(Address::random_local()).await?;
info!("*********************** after create detached ****************");
info!("----------------------- before stopping detached-----------------");
detached.stop().await?;
info!("----------------------- after stopping detached -----------------");
sleep(Duration::from_secs(2)).await;
info!("*********************** before create start worker ****************");
ctx.start_worker("dummy", DummyWorker).await?;
// never gets here
info!("*********************** after create start worker ****************");
or just stop context twice:
ctx.stop().await?;
ctx.stop().await?; // hangs
-
Possible cause When stopping context a shutdown request is headed to Router, as a result the Router exits its event loop sending ack back to the context. For the second stopping, there is no longer a listening receiver on the routers end, however the receiver has not been dropped. Tokio mpsc channel's docs say that the sender returns error if the other end is dropped, however here this is not the case, so tx passes smoothly and the context still waits for the ack, which is not going to arrive, so the thread hangs. This gets more complex as there are such entities as detached context and workers which share the same cloned sender. So any of them can cause the Router to shutdown the listening loop thus all other entities will wait for acks from the loop which no longer runs.
-
Remarks I am totally unfamiliar with Ockam so my apologies if it is a false alarm or a known thing.
@Retamogordo thank you for creating an issue.
Hello @Retamogordo, thank you for investigating this! I've just looked into the cause of this and yea, it looks like the Router can shut down, but because it's still owned by the Executor the receiver stays open.
Probably the best thing to do here would be to drop the RouterReceiver
in-place (i.e. swap it for a None
Option). I'm not sure this is related to #3054, my understanding was that we had issues with panics being swallowed. But this behaviour might still be related.
Hi @spacekookie ! So it's cool, now I close the issue ?
Leave the issue open for now. It'll be automatically closed once that PR gets merged