synapse federation catchup is slow and network-heavy

If a remote HS takes a long time to process a transaction (perhaps because of #9490), then we will enter "catchup mode" for that homeserver. In this mode, we only ever send the most recent event per room, which means that the remote homeserver has to backfill for any intermediate events, which is, comparatively very slow (and may require calling /state_ids, see https://github.com/matrix-org/synapse/issues/7893 and https://github.com/matrix-org/synapse/issues/6597).

So, we have a homeserver which is struggling to keep up, and we are making its life even harder by only sending it a subset of the traffic in the room.

I'm not at all sure why we do this. Shouldn't we send all the events that the remote server hasn't received (up to a limit)?

Feb 24 '21 18:02 richvdh

What does the spec say? Shouldn't it try to send all events it can queue up (within reason)?

Feb 24 '21 18:02 ShadowJonathan

It looks like we fetch the oldest events that the remote has not yet received: https://github.com/matrix-org/synapse/blob/0a00b7ff14890987f09112a2ae696c61001e6cf1/synapse/federation/sender/per_destination_queue.py#L461-L465

https://github.com/matrix-org/synapse/blob/0a00b7ff14890987f09112a2ae696c61001e6cf1/synapse/storage/databases/main/transactions.py#L424-L442

rather than the most recent event per room?

Feb 25 '21 13:02 anoadragon453

destination_rooms stores the stream_ordering of the most recent event in each room that we should have sent to each destination, rather than the most recent event that we actually did send. So when we join to events, we get the event id of that single most recent event per room.

Feb 25 '21 15:02 richvdh

I wonder if another thing at play here is the interaction of all the different servers in the room. Take a busy room like Matrix HQ, now after a few hours of down time a server will receive the latest event from all servers that sent an event during that time, for each one the receiving server will first do /get_missing_events to fetch up to ten missing events, then do a /state_ids request if there is still a gap. This means that for a single room the server can end up processing a lot more than just the latest event, and end up with a many small chunks of DAG int the gap between it going offline and coming back online

Mar 09 '21 16:03 erikjohnston

Maybe rooms need something similar to ResponseCache.wrap(), in the sense that getting the backlog and state_id is handled by one worker, and every subsequent request is queued to that until it is done, this might require closer and more complex coordination (even more so across workers), but it'll solve that problem.

Mar 09 '21 18:03 ShadowJonathan

I wonder if another thing at play here is the interaction of all the different servers in the room. Take a busy room like Matrix HQ, now after a few hours of down time a server will receive the latest event from all servers that sent an event during that time, for each one the receiving server will first do /get_missing_events to fetch up to ten missing events, then do a /state_ids request if there is still a gap. This means that for a single room the server can end up processing a lot more than just the latest event, and end up with a many small chunks of DAG int the gap between it going offline and coming back online

I wonder if Synapse should only send out events if they're a forward extremity? In the hopes that the server that has sent subsequent events in the room will send the latest event? It's possible that the other server won't, but that feels like a rare case. By doing so it should mean that the receiving server only receives the current extremities they've missed, rather than an additional smattering of events at different points in the gap.

Mar 11 '21 16:03 erikjohnston

Note that on March 18th a PR was merged that implements Erik's suggestion from above: https://github.com/matrix-org/synapse/pull/9640

Aug 05 '22 15:08 reivilibre

yes, I've updated the summary of this issue, since it's no longer a simple matter of optimisation.

Aug 08 '22 09:08 richvdh