iroh icon indicating copy to clipboard operation
iroh copied to clipboard

Unify gossip dispatchers / flaky test: `gossip_smoke`

Open rklaehn opened this issue 1 year ago • 1 comments

We got 2 dispatchers in the public iroh gossip client API. The one that directly drives the gossip client, and then another one inside iroh-gossip.

We should unify the two somehow so we don't have gossip messages going through 2 queues on the way to the consumer.

So the remaining TODO is to merge the two dispatchers (in dispatcher.rs and net.rs) so that messages flow only over a single channel and not two. We should do this, but IMO it can also happen in a followup.

Originally posted by @Frando in https://github.com/n0-computer/iroh/issues/2258#issuecomment-2210299055

rklaehn avatar Jul 05 '24 07:07 rklaehn

The current dispatcher has a bug which makes the gossip_smoke client test flaky. (edit: flaky mark added in #2468)

Copying from Discord:

I think I found the issue: t0: subscribe called -> join_task in the new dispatcher t1: join_task starts t2: join_task awaited Gossip::join t3: dispatch_loop received event ReceviedMessage, but still has no live subscription for the topic t4: join_task updates the topic to live

so the dispatch_loop receives the ReceivedMessage event before the join_task updated the subscription to live

dispatch_loop and join_task run in independent tokio tasks waiting for the same events emitted from Gossip, and then lock a sync mutex right after the event arrives, depending on the ordering one gets to do its work first, and if the dispatch_loop happens to acquire the mutex two times (for the neighbor up and the recvmesage) before the join_task acquires it, then the recvmessage is lost becaues the subscription is not yet set to live this tells me: the proper fix is likely to remove the dispatcher and integrate its functionality into Gossip?

Frando avatar Jul 08 '24 09:07 Frando

For some reason this was filed, but gossip_smoke wasn't actually marked flaky on main. This happened in #2559 now instead.

matheus23 avatar Aug 01 '24 07:08 matheus23

Fixed

dignifiedquire avatar Aug 05 '24 20:08 dignifiedquire

https://github.com/n0-computer/iroh/pull/2570

dignifiedquire avatar Aug 05 '24 20:08 dignifiedquire