synapse
synapse copied to clipboard
Inconsistent state events returned by series of incremental `/sync` calls
This issue is based on https://github.com/element-hq/element-call/issues/2458#issuecomment-2202395920
Synapse has been observed sending inconsistent room state events for incremental syncs when looking at the timeline
and state
arrays for a particular room.
The HAR of the sync requests
https://call-unstable.ems.host/_matrix/client/v3/sync?filter=0&timeout=30000&set_presence=online&since=s1002136_1_12_1333_4049_1_426_5717_0_1
{
"next_batch": "s1002138_1_12_1333_4050_1_426_5717_0_1",
"timeline": {
"events": [
{
"type": "org.matrix.msc3401.call.member",
"sender": "@hughns2:call-unstable.ems.host",
"state_key": "@hughns2:call-unstable.ems.host",
"unsigned": {
"replaces_state": "$91MdSs1OcBO5xguVL2zeNhBQlo3hAt_DyFaEZkW8SfQ",
},
"event_id": "$q5m5Fk6Q0WuGC6PAR5QqrfnFPnXqz30amvy3vpf5NqM"
}
---------------
https://call-unstable.ems.host/_matrix/client/v3/sync?filter=0&timeout=30000&set_presence=online&since=s1002138_1_12_1333_4050_1_426_5717_0_1
{
"next_batch": "s1002140_1_12_1334_4051_1_426_5717_0_1",
"state": {
"events": [
{
"type": "org.matrix.msc3401.call.member",
"sender": "@hughns2:call-unstable.ems.host",
"state_key": "@hughns2:call-unstable.ems.host",
"unsigned": {
"replaces_state": "$9mFjm_Ox58jeAvdSL8OWlSShagn65niqyLLAHX23QGs",
},
"event_id": "$91MdSs1OcBO5xguVL2zeNhBQlo3hAt_DyFaEZkW8SfQ"
}
]
},
"ephemeral": {
"events": [
{
"type": "m.receipt",
"content": {
"$q5m5Fk6Q0WuGC6PAR5QqrfnFPnXqz30amvy3vpf5NqM": {
"m.read": {
"@hughns2:call-unstable.ems.host": {
"ts": 1719851861973,
"thread_id": "main"
}
}
----------------
https://call-unstable.ems.host/_matrix/client/v3/sync?filter=0&timeout=30000&set_presence=online&since=s1002143_1_12_1335_4052_1_426_5717_0_1
{
"next_batch": "s1002146_1_12_1335_4052_1_426_5717_0_1",
"timeline": {
"events": [
{
"type": "org.matrix.msc3401.call.member",
"sender": "@hughns2:call-unstable.ems.host",
"state_key": "@hughns2:call-unstable.ems.host",
"unsigned": {
"replaces_state": "$q5m5Fk6Q0WuGC6PAR5QqrfnFPnXqz30amvy3vpf5NqM",
}
"event_id": "$aemXSHbzKzePfTDzuX00mRFZJmJIJFdQ3I55EPllOUY"
}
-----------------------------------
https://call-unstable.ems.host/_matrix/client/v3/sync?filter=0&timeout=30000&set_presence=online&since=s1002146_1_12_1335_4052_1_426_5717_0_1
{
"next_batch": "s1002148_1_12_1335_4052_1_426_5717_0_1",
"state": {
"events": [
{
"type": "org.matrix.msc3401.call.member",
"sender": "@hughns2:call-unstable.ems.host",
"state_key": "@hughns2:call-unstable.ems.host",
"origin_server_ts": 1719851861110,
"unsigned": {
"replaces_state": "$91MdSs1OcBO5xguVL2zeNhBQlo3hAt_DyFaEZkW8SfQ",
},
"event_id": "$q5m5Fk6Q0WuGC6PAR5QqrfnFPnXqz30amvy3vpf5NqM"
}
----------------
https://call-unstable.ems.host/_matrix/client/v3/sync?filter=0&timeout=30000&set_presence=online&since=s1002148_1_12_1335_4052_1_426_5717_0_1
{
"next_batch": "s1002149_1_12_1335_4052_1_426_5717_0_1",
"timeline": {
"events": [
{
"type": "org.matrix.msc3401.call.member",
"sender": "@hughns2:call-unstable.ems.host",
"state_key": "@hughns2:call-unstable.ems.host",
"unsigned": {
"replaces_state": "$aemXSHbzKzePfTDzuX00mRFZJmJIJFdQ3I55EPllOUY",
},
"event_id": "$8LDSegFL8fkNUaNI_Jc2pgXla17t2FUmsE6JusnYCyU"
}
],
--------------------
This can be summarised as the following changes:
timeline: $91M -> $q5m
state: $9mF -> $91M # but, $91M has already been replaced by $q5m
timeline: $q5m -> $aem
state: $91M -> $q5m # but, $q5m has already been replaced by $aem
timeline: $aem -> $8LD
Because this is an incremental sync, there is no need for the state
array to contain anything as the timeline
has all the events anyhow. By sending the out of date state
array back to the client it confuses the client.
Looking at the DB we get
select stream_ordering, event_id, type, state_key, (select array_agg(prev_event_id) from event_edges where events.event_id = event_edges.event_id), (select state_group from event_to_state_groups as sg where events.event_id = sg.event_id) from events where room_id = '!totKtECNNKyhzLOiNl:call-unstable.ems.host' and 1002136 < stream_ordering and stream_ordering <= 1002149 order by stream_ordering;
stream_ordering | event_id | type | state_key | array_agg | state_group
-----------------+----------------------------------------------+--------------------------------+---------------------------------+---------------------------------------------------------------------------------------------+-------------
1002136 | $YPhOcZiyzEJyMHsrPFtJKxCljsZzQZOFjOE2k7_9am0 | org.matrix.msc3401.call.member | @hughns:call-unstable.ems.host | {$89_MkkYqF-ViNH0qGkqmQyb5LPnslb4VqQarURn0iUA} | 37615
1002138 | $q5m5Fk6Q0WuGC6PAR5QqrfnFPnXqz30amvy3vpf5NqM | org.matrix.msc3401.call.member | @hughns2:call-unstable.ems.host | {$YPhOcZiyzEJyMHsrPFtJKxCljsZzQZOFjOE2k7_9am0} | 37616
1002139 | $tIhui6kBfzQDGV5FaZZJn_yKsPqeUwM22JeUIikdFRw | m.room.encrypted | | {$YPhOcZiyzEJyMHsrPFtJKxCljsZzQZOFjOE2k7_9am0} | 37615
1002140 | $bTqD6YwK-cwSv4-F6G7_BlWgItaUUssQ5bRFAET-ReA | m.room.encrypted | | {$q5m5Fk6Q0WuGC6PAR5QqrfnFPnXqz30amvy3vpf5NqM,$tIhui6kBfzQDGV5FaZZJn_yKsPqeUwM22JeUIikdFRw} | 37616
1002146 | $aemXSHbzKzePfTDzuX00mRFZJmJIJFdQ3I55EPllOUY | org.matrix.msc3401.call.member | @hughns2:call-unstable.ems.host | {$bTqD6YwK-cwSv4-F6G7_BlWgItaUUssQ5bRFAET-ReA} | 37617
1002147 | $ugzpWM2F6VRnEcpnp0dYSzfXnKKRyTCHH-TQc9dcwcM | m.room.encrypted | | {$bTqD6YwK-cwSv4-F6G7_BlWgItaUUssQ5bRFAET-ReA} | 37616
1002148 | $xH5t5qKv0tQO7_wUdxoatp9pz-YsV_wnNSLxntLYuqg | m.room.encrypted | | {$aemXSHbzKzePfTDzuX00mRFZJmJIJFdQ3I55EPllOUY,$ugzpWM2F6VRnEcpnp0dYSzfXnKKRyTCHH-TQc9dcwcM} | 37617
1002149 | $8LDSegFL8fkNUaNI_Jc2pgXla17t2FUmsE6JusnYCyU | org.matrix.msc3401.call.member | @hughns2:call-unstable.ems.host | {$xH5t5qKv0tQO7_wUdxoatp9pz-YsV_wnNSLxntLYuqg} | 37618
(8 rows)
Which is roughly:
Where bottom is most recent and background colour is the state groups.
If you look at the DB output, which is ordered by
stream_ordering
you can see that the state keeps bouncing back and forth. This is likely confusing the algorithm for computing the state deltas to send to clients, due to poor handling of these cases in Synapse. The fix for this is probably to move to tracking state changes via current state, but that is not a small change.