Unify gossip dispatchers / flaky test: `gossip_smoke`
We got 2 dispatchers in the public iroh gossip client API. The one that directly drives the gossip client, and then another one inside iroh-gossip.
We should unify the two somehow so we don't have gossip messages going through 2 queues on the way to the consumer.
So the remaining TODO is to merge the two dispatchers (in dispatcher.rs and net.rs) so that messages flow only over a single channel and not two. We should do this, but IMO it can also happen in a followup.
Originally posted by @Frando in https://github.com/n0-computer/iroh/issues/2258#issuecomment-2210299055
The current dispatcher has a bug which makes the gossip_smoke client test flaky.
(edit: flaky mark added in #2468)
Copying from Discord:
I think I found the issue: t0: subscribe called -> join_task in the new dispatcher t1: join_task starts t2: join_task awaited Gossip::join t3: dispatch_loop received event ReceviedMessage, but still has no live subscription for the topic t4: join_task updates the topic to live
so the dispatch_loop receives the ReceivedMessage event before the join_task updated the subscription to live
dispatch_loop and join_task run in independent tokio tasks waiting for the same events emitted from Gossip, and then lock a sync mutex right after the event arrives, depending on the ordering one gets to do its work first, and if the dispatch_loop happens to acquire the mutex two times (for the neighbor up and the recvmesage) before the join_task acquires it, then the recvmessage is lost becaues the subscription is not yet set to live this tells me: the proper fix is likely to remove the dispatcher and integrate its functionality into Gossip?
For some reason this was filed, but gossip_smoke wasn't actually marked flaky on main. This happened in #2559 now instead.
Fixed
https://github.com/n0-computer/iroh/pull/2570