Locally-rejected invites don't get passed on to the client over SSS
Description
[in fairness I have not validated if this is a server-side or client-side problem]
Background: When a user rejects an invite, we generate a "leave" event to update their membership state. For an invite received from a remote server, we normally ask the remote server to generate the leave event for us. However, there is a fallback case for when the remote server is unreachable: we will generate an "outlier" event locally, which is then only sent to the invited user.
It appears that such locally-generated leave events are not passed down SSS.
In other words: this is a regression of https://github.com/matrix-org/synapse/issues/2181, but applies only to SSS.
Steps to reproduce
- Receive an invite from a user on a remote server
- The remote server goes offline
- On (say) Element-Web, reject the invite
- On EX, observe that you still have a pending invite
Homeserver
sw1v.org
Synapse Version
v1.115.0
Installation Method
pip (from PyPI)
Database
postgres. no. yes. yes.
Workers
Single process
Platform
Configuration
Relevant log output
-
Anything else that would be useful to know?
I have something like this going on with EX, but it cleared up after rejecting Invites in Element then logging out on EX and back in on EX to find the Invites list updated and cleared up.
I was having this issue for a long time now that an (abusive) invite is sitting in my EXA inbox, which is not there in any of my many other incl non-Element clients and while in principle I know enough workarounds to get rid of it, my goal is to help uncover and keep track of flaws impacting "Matrix 2.0" users with this.
I've had the time to investigate a bit and the invite is definitely coming down synapse's SSS endpoint. Sadly it's almost impossible to check the stable sync for it manually, but I tested that a client doing a stable initial sync does not receive it.
I'm using the mjolnir Antispam module off the shelf from the matrix-docker-ansible-deploy playbook with #matrix-org-coc-bl:matrix.org and CME.
My suspicion is currently that I'm hitting this bug or it's equivalent with the module API.
I have been digging into this and this is what I have found:
- the matrix-rust-sdk is expecting a SSS response with the leave event in the
rooms: "required_state"section - Synapse doesn't appear to provide the leave event in the appropriate section when rejecting a remote invite
- this is true whether or not the remote server is up (ie. whether or not we resort to the local fallback mechanism mentioned)
- When calculating new forward extremities after receiving a remote invite or generating a leave for a remote room, there is no forward extremity returned from the calculation: https://github.com/element-hq/synapse/blob/1efb826b54f4a8fdfa56c27a32da9c332fa87cc3/synapse/storage/controllers/persist_events.py#L674
- This is because the invite & the leave are outliers: https://github.com/element-hq/synapse/blob/1efb826b54f4a8fdfa56c27a32da9c332fa87cc3/synapse/storage/controllers/persist_events.py#L781-L789
- This leads to not updating the current_state_delta_stream: https://github.com/element-hq/synapse/blob/1efb826b54f4a8fdfa56c27a32da9c332fa87cc3/synapse/storage/controllers/persist_events.py#L646 https://github.com/element-hq/synapse/blob/1efb826b54f4a8fdfa56c27a32da9c332fa87cc3/synapse/storage/databases/main/events.py#L340-L348 https://github.com/element-hq/synapse/blob/1efb826b54f4a8fdfa56c27a32da9c332fa87cc3/synapse/storage/databases/main/events.py#L1010-L1023
- which leads to not sending the
leavedown SSS: https://github.com/element-hq/synapse/blob/1efb826b54f4a8fdfa56c27a32da9c332fa87cc3/synapse/handlers/sliding_sync/room_lists.py#L294-L311 https://github.com/element-hq/synapse/blob/1efb826b54f4a8fdfa56c27a32da9c332fa87cc3/synapse/handlers/sliding_sync/room_lists.py#L1118-L1122
I have this reproducible behaviour as a complement-crypto test. The test passes if I swtich bob/client2 to also be on hs1 so the invite & leave are no longer for a remote room:
// A and B are on different servers.
// A invites B to a room.
// A's server goes offline.
// B rejects the invite.
// B should see a `leave` for the room come down /sync.
func TestRemoteInviteRejectionSlidingSync(t *testing.T) {
Instance().ForEachClientType(t, func(t *testing.T, clientType api.ClientType) {
tc := Instance().CreateTestContext(t, api.ClientType{
Lang: clientType.Lang,
HS: "hs1",
}, api.ClientType{
Lang: clientType.Lang,
HS: "hs2",
})
// NOTE: sleep a while to let Synapse background updates catch up, to ensure we use the
// new sliding sync tables, not the fallback
time.Sleep(time.Minute)
tc.WithClientSyncing(t, &cc.ClientCreationRequest{
User: tc.Bob,
}, func(bob api.TestClient) {
// Ensure plenty of time for initial sync to have happened
time.Sleep(time.Second * 3)
// Alice creates a new room and invites Bob
roomID := tc.CreateNewEncryptedRoom(t, tc.Alice, cc.EncRoomOptions.Invite([]string{tc.Bob.UserID}))
// Ensure plenty of time for bob to receive invite
time.Sleep(time.Second * 3)
// And Alice's HS becomes unreachable
//tc.Deployment.PauseServer(t, "hs1")
// And when Bob rejects the invite
tc.Bob.MustLeaveRoom(t, roomID)
// Then Bob should receive the leave down /sync
bob.WaitUntilEventInRoom(
t,
roomID,
api.CheckEventHasMembership(bob.UserID(), "leave"),
).Waitf(t, 5*time.Second, "bob did not see bob's leave")
})
})
}
I have a very hacky version of Synapse going where it does respond with the leave event in SSS required_state, but the test still fails. I think there may be another issue where the rust-sdk is eating the leave event.
I no longer think there is an issue in the rust-sdk with the leave event. I have written a unit test for the SDK and it handles the leave just fine. I think what I was seeing was an issue with the way the complement test waits for events vs what my test expected. It waits for timeline events, but in this case the event wasn't being added to the timeline. Just the required_state.
I tested this out further with the lastest version of Element X Android.
When rejecting the remote invite in Element Web, it takes a while before Element X Android also eventually removes the invite.
With my hacky Synapse fix, as soon as Element Web finishes rejecting the invite (which can take a while with the remote server being down), Element X Android also instantly removes the invite. So the behaviour is much better when we pass the leave event down the required_state field in the SSS response.
And here is the Synapse hack which "fixes" the issue: https://github.com/element-hq/synapse/pull/18278
This boils down to a combination of two things when sliding sync calculates the set of rooms the user is interested in:
get_sliding_sync_rooms_for_userexplicitly does not return rooms the user has left. This is for performance, as large accounts often have a huge number of left rooms._get_newly_joined_and_left_rooms, which adds back in rooms that the user has newly left, uses thecurrent_state_delta_streamtable to pick up changes in membership. However, the current state only is filled out for rooms the server is in, and so doesn't pick up out-of-band leaves (including locally rejected invites).
A WIP is here: https://github.com/element-hq/synapse/compare/erikj/ss_reject_invites
A few things left to do (beyond cleaning it up), see # TODO in the code:
- Correctly filter out membership "changes" caused by display name changes (which causes JOIN -> JOIN)
- Add an index
- In
get_room_sync_datawe need to pull out the membership change
I added a commit to that branch that I think accomplishes 1 & 3 from the list above & fixes the linter errors. It also expands the unit test to validate the response actually contains the leave as expected.
Remaining work should be to add the index.
I've tested out these changes using element web (using develop.element.io to enable experimental sliding sync)
Element version: 421a79aaa56f-js-f322f32a077f
Crypto version: Rust SDK 0.11.0 (f2e32d4), Vodozemac 0.9.0
I have:
tester_a - old synctester_a - sliding synctester_b
With the develop branch of Synapse, if tester_b invites tester_a to a room, and tester_a - old sync rejects it, the invite is stuck on tester_a sliding sync (present, and cannot reject due to error: Failed to reject invite
MatrixError: [500] Internal server error (http://localhost:8080/_matrix/client/v3/rooms/!jetvclGxXQAszbvsLg%3Alocalhost%3A8481/leave))
With the erikj/ss_reject_invites branch, if tester_b invites tester_a to a room, and tester_a - old sync rejects it, the invite is immediately cleared on tester_a - sliding sync.
Running synapse v1.130.0 does not fix this issue. For context, I'm running https://cgit.rory.gay/matrix/tools/MatrixAntiDmSpam.git/about/ to auto reject spam invites. These rejections don't get passed on to Element X iOS (Version: 25.05.1 (166)) correctly. I'm left with empty rooms in my room list that even display the room's thumbnail, which is especially frustrating with the current attacks. Clearing cache does nothing. I have to log out of my account and back in to get the correct list of rooms.
Also trying to leave generates an error with logs like this:
2025-05-22 13:50:50,323 - synapse.http.server - 146 - ERROR - POST-2218- Failed handle request via 'RoomMembershipRestServlet': <XForwardedForRequest at 0x7e7636237bf0 method='POST' uri='/_matrix/client/v3/rooms/!TuAWq9BYI8scZb5bTg:matrix.getsilly.org/leave' clientproto='HTTP/1.1' site='unix'>
Traceback (most recent call last):
File "/opt/venvs/matrix-synapse/lib/python3.12/site-packages/synapse/http/server.py", line 332, in _async_render_wrapper
callback_return = await self._async_render(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venvs/matrix-synapse/lib/python3.12/site-packages/synapse/http/server.py", line 544, in _async_render
callback_return = await raw_callback_return
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venvs/matrix-synapse/lib/python3.12/site-packages/synapse/rest/client/room.py", line 1158, in on_POST
return await self._do(request, requester, room_id, membership_action, None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venvs/matrix-synapse/lib/python3.12/site-packages/synapse/rest/client/room.py", line 1131, in _do
await self.room_member_handler.update_membership(
File "/opt/venvs/matrix-synapse/lib/python3.12/site-packages/synapse/handlers/room_member.py", line 666, in update_membership
result = await self.update_membership_locked(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venvs/matrix-synapse/lib/python3.12/site-packages/synapse/handlers/room_member.py", line 1177, in update_membership_locked
return await self._local_membership_update(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venvs/matrix-synapse/lib/python3.12/site-packages/synapse/handlers/room_member.py", line 472, in _local_membership_update
) = await self.event_creation_handler.create_event(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venvs/matrix-synapse/lib/python3.12/site-packages/synapse/handlers/message.py", line 716, in create_event
event, unpersisted_context = await self.create_new_client_event(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venvs/matrix-synapse/lib/python3.12/site-packages/synapse/util/metrics.py", line 129, in measured_func
r = await func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venvs/matrix-synapse/lib/python3.12/site-packages/synapse/handlers/message.py", line 1273, in create_new_client_event
builder.type == EventTypes.Create or prev_event_ids
AssertionError: Attempting to create a non-m.room.create event with no prev_events
@nacorid Please create a new issue. It looks like you're running into a separate thing where the leave event isn't being created.
@nacorid Please create a new issue. It looks like you're running into a separate thing where the
leaveevent isn't being created.
No, the leave event is created successfully and propagates to clients after clearing the client's cache, except for Element X iOS, the only SSS capable client I use. Logging out of Element X iOS fetches the new room list correctly, without the rooms that I was invited to.