synapse icon indicating copy to clipboard operation
synapse copied to clipboard

Sending many events in parallel creates forward extremities instead of linearizing creation

Open tulir opened this issue 7 months ago • 4 comments

Description

When sending many events in parallel, Synapse doesn't linearize the event persistence properly, leading to the creation of forward extremities.

I think this is a regression because I recall linearization being fixed years ago, but I don't remember the exact details of the previous fix and couldn't find any obvious problems by scanning recent commits.

Steps to reproduce

  • send multiple events to a single room in parallel (using the standard /_matrix/client/v3/rooms/.../send/m.room.message/txnid endpoint, on a single server with a single account)
  • observe rows in the event_forward_extremities table for the room and the prev_events for each created event

Expected result: there is one forward extremity (the last event) and each event references the previous one

Actual result: each event sent in parallel is listed as a forward extremity, and they all reference the same prev event

Homeserver

maunium.net, localhost, rory.gay, codestorm.net

Synapse Version

1.129.0

Database

Tested with PostgreSQL v17 and v14

Workers

Reproducible with any configuration (single process, one event stream worker, multiple event stream workers)

tulir avatar May 11 '25 18:05 tulir

synapse/handlers/message.py#L1034 is the limiter thing which should prevent this. And https://github.com/element-hq/synapse/commit/55e4d27b36fd69a3cf3eceecbd42706579ef2dc7 is the change to make the limit 1 (same as our Beeper fork when we discovered the same issue). So seems like a regression there, despite the limiter still existing.

Fizzadar avatar May 19 '25 13:05 Fizzadar

To double-check, is it causing any bugs? Feels like the client endpoints would return the events fine and would resolve itself when the another events comes in. The comments around the code that @Fizzadar shared explain why we care about processing events linearly though.

My guess is that this regressed in 54c012c where the create_event(...) call was moved outside of the self.limiter.queue(event_dict["room_id"]) block.

MadLittleMods avatar May 19 '25 13:05 MadLittleMods

@erikjohnston noted internally that the create_and_send_nonmember_event lineariser operators within the bounds of a single worker. If your deployment has multiple worker types which call this method, then you'll still end up with parallel inserts.

@tulir have you seen this behaviour change meaningfully, or is the speculation of a regression based solely on the per-worker lineariser being added in ~2023?

anoadragon453 avatar May 19 '25 13:05 anoadragon453

It being a regression is just speculation, could be it has always been like that. I only noticed it recently when sending events to certain rooms was very slow and the rooms turned out to have 100+ forward extremities caused by a single server. Seemed to happen with a single process too though, so probably not worker-related.

tulir avatar May 19 '25 17:05 tulir