Sending many events in parallel creates forward extremities instead of linearizing creation
Description
When sending many events in parallel, Synapse doesn't linearize the event persistence properly, leading to the creation of forward extremities.
I think this is a regression because I recall linearization being fixed years ago, but I don't remember the exact details of the previous fix and couldn't find any obvious problems by scanning recent commits.
Steps to reproduce
- send multiple events to a single room in parallel (using the standard
/_matrix/client/v3/rooms/.../send/m.room.message/txnidendpoint, on a single server with a single account) - observe rows in the
event_forward_extremitiestable for the room and theprev_eventsfor each created event
Expected result: there is one forward extremity (the last event) and each event references the previous one
Actual result: each event sent in parallel is listed as a forward extremity, and they all reference the same prev event
Homeserver
maunium.net, localhost, rory.gay, codestorm.net
Synapse Version
1.129.0
Database
Tested with PostgreSQL v17 and v14
Workers
Reproducible with any configuration (single process, one event stream worker, multiple event stream workers)
synapse/handlers/message.py#L1034 is the limiter thing which should prevent this. And https://github.com/element-hq/synapse/commit/55e4d27b36fd69a3cf3eceecbd42706579ef2dc7 is the change to make the limit 1 (same as our Beeper fork when we discovered the same issue). So seems like a regression there, despite the limiter still existing.
To double-check, is it causing any bugs? Feels like the client endpoints would return the events fine and would resolve itself when the another events comes in. The comments around the code that @Fizzadar shared explain why we care about processing events linearly though.
My guess is that this regressed in 54c012c where the create_event(...) call was moved outside of the self.limiter.queue(event_dict["room_id"]) block.
@erikjohnston noted internally that the create_and_send_nonmember_event lineariser operators within the bounds of a single worker. If your deployment has multiple worker types which call this method, then you'll still end up with parallel inserts.
@tulir have you seen this behaviour change meaningfully, or is the speculation of a regression based solely on the per-worker lineariser being added in ~2023?
It being a regression is just speculation, could be it has always been like that. I only noticed it recently when sending events to certain rooms was very slow and the rooms turned out to have 100+ forward extremities caused by a single server. Seemed to happen with a single process too though, so probably not worker-related.