synapse icon indicating copy to clipboard operation
synapse copied to clipboard

Locally-rejected invites don't get passed on to the client over SSS

Open richvdh opened this issue 1 year ago • 1 comments

Description

[in fairness I have not validated if this is a server-side or client-side problem]

Background: When a user rejects an invite, we generate a "leave" event to update their membership state. For an invite received from a remote server, we normally ask the remote server to generate the leave event for us. However, there is a fallback case for when the remote server is unreachable: we will generate an "outlier" event locally, which is then only sent to the invited user.

It appears that such locally-generated leave events are not passed down SSS.

In other words: this is a regression of https://github.com/matrix-org/synapse/issues/2181, but applies only to SSS.

Steps to reproduce

  • Receive an invite from a user on a remote server
  • The remote server goes offline
  • On (say) Element-Web, reject the invite
  • On EX, observe that you still have a pending invite

Homeserver

sw1v.org

Synapse Version

v1.115.0

Installation Method

pip (from PyPI)

Database

postgres. no. yes. yes.

Workers

Single process

Platform

Configuration

Relevant log output

-

Anything else that would be useful to know?

richvdh avatar Sep 24 '24 13:09 richvdh

I have something like this going on with EX, but it cleared up after rejecting Invites in Element then logging out on EX and back in on EX to find the Invites list updated and cleared up.

Andoriax avatar Oct 02 '24 10:10 Andoriax

I was having this issue for a long time now that an (abusive) invite is sitting in my EXA inbox, which is not there in any of my many other incl non-Element clients and while in principle I know enough workarounds to get rid of it, my goal is to help uncover and keep track of flaws impacting "Matrix 2.0" users with this.

I've had the time to investigate a bit and the invite is definitely coming down synapse's SSS endpoint. Sadly it's almost impossible to check the stable sync for it manually, but I tested that a client doing a stable initial sync does not receive it.

I'm using the mjolnir Antispam module off the shelf from the matrix-docker-ansible-deploy playbook with #matrix-org-coc-bl:matrix.org and CME.

My suspicion is currently that I'm hitting this bug or it's equivalent with the module API.

HarHarLinks avatar Nov 05 '24 12:11 HarHarLinks

I have been digging into this and this is what I have found:

  • the matrix-rust-sdk is expecting a SSS response with the leave event in the rooms: "required_state" section
  • Synapse doesn't appear to provide the leave event in the appropriate section when rejecting a remote invite
    • this is true whether or not the remote server is up (ie. whether or not we resort to the local fallback mechanism mentioned)
  • When calculating new forward extremities after receiving a remote invite or generating a leave for a remote room, there is no forward extremity returned from the calculation: https://github.com/element-hq/synapse/blob/1efb826b54f4a8fdfa56c27a32da9c332fa87cc3/synapse/storage/controllers/persist_events.py#L674
    • This is because the invite & the leave are outliers: https://github.com/element-hq/synapse/blob/1efb826b54f4a8fdfa56c27a32da9c332fa87cc3/synapse/storage/controllers/persist_events.py#L781-L789
    • This leads to not updating the current_state_delta_stream: https://github.com/element-hq/synapse/blob/1efb826b54f4a8fdfa56c27a32da9c332fa87cc3/synapse/storage/controllers/persist_events.py#L646 https://github.com/element-hq/synapse/blob/1efb826b54f4a8fdfa56c27a32da9c332fa87cc3/synapse/storage/databases/main/events.py#L340-L348 https://github.com/element-hq/synapse/blob/1efb826b54f4a8fdfa56c27a32da9c332fa87cc3/synapse/storage/databases/main/events.py#L1010-L1023
    • which leads to not sending the leave down SSS: https://github.com/element-hq/synapse/blob/1efb826b54f4a8fdfa56c27a32da9c332fa87cc3/synapse/handlers/sliding_sync/room_lists.py#L294-L311 https://github.com/element-hq/synapse/blob/1efb826b54f4a8fdfa56c27a32da9c332fa87cc3/synapse/handlers/sliding_sync/room_lists.py#L1118-L1122

I have this reproducible behaviour as a complement-crypto test. The test passes if I swtich bob/client2 to also be on hs1 so the invite & leave are no longer for a remote room:

// A and B are on different servers.
// A invites B to a room.
// A's server goes offline.
// B rejects the invite.
// B should see a `leave` for the room come down /sync.
func TestRemoteInviteRejectionSlidingSync(t *testing.T) {
	Instance().ForEachClientType(t, func(t *testing.T, clientType api.ClientType) {
		tc := Instance().CreateTestContext(t, api.ClientType{
			Lang: clientType.Lang,
			HS:   "hs1",
		}, api.ClientType{
			Lang: clientType.Lang,
			HS:   "hs2",
		})

		// NOTE: sleep a while to let Synapse background updates catch up, to ensure we use the
		// new sliding sync tables, not the fallback
		time.Sleep(time.Minute)

		tc.WithClientSyncing(t, &cc.ClientCreationRequest{
			User: tc.Bob,
		}, func(bob api.TestClient) {
			// Ensure plenty of time for initial sync to have happened
			time.Sleep(time.Second * 3)

			// Alice creates a new room and invites Bob
			roomID := tc.CreateNewEncryptedRoom(t, tc.Alice, cc.EncRoomOptions.Invite([]string{tc.Bob.UserID}))

			// Ensure plenty of time for bob to receive invite
			time.Sleep(time.Second * 3)

			// And Alice's HS becomes unreachable
			//tc.Deployment.PauseServer(t, "hs1")

			// And when Bob rejects the invite
			tc.Bob.MustLeaveRoom(t, roomID)

			// Then Bob should receive the leave down /sync
			bob.WaitUntilEventInRoom(
				t,
				roomID,
				api.CheckEventHasMembership(bob.UserID(), "leave"),
			).Waitf(t, 5*time.Second, "bob did not see bob's leave")
		})
	})
}

I have a very hacky version of Synapse going where it does respond with the leave event in SSS required_state, but the test still fails. I think there may be another issue where the rust-sdk is eating the leave event.

devonh avatar Mar 22 '25 02:03 devonh

I no longer think there is an issue in the rust-sdk with the leave event. I have written a unit test for the SDK and it handles the leave just fine. I think what I was seeing was an issue with the way the complement test waits for events vs what my test expected. It waits for timeline events, but in this case the event wasn't being added to the timeline. Just the required_state.

devonh avatar Mar 24 '25 20:03 devonh

I tested this out further with the lastest version of Element X Android. When rejecting the remote invite in Element Web, it takes a while before Element X Android also eventually removes the invite. With my hacky Synapse fix, as soon as Element Web finishes rejecting the invite (which can take a while with the remote server being down), Element X Android also instantly removes the invite. So the behaviour is much better when we pass the leave event down the required_state field in the SSS response.

devonh avatar Mar 24 '25 20:03 devonh

And here is the Synapse hack which "fixes" the issue: https://github.com/element-hq/synapse/pull/18278

devonh avatar Mar 24 '25 20:03 devonh

This boils down to a combination of two things when sliding sync calculates the set of rooms the user is interested in:

  1. get_sliding_sync_rooms_for_user explicitly does not return rooms the user has left. This is for performance, as large accounts often have a huge number of left rooms.
  2. _get_newly_joined_and_left_rooms, which adds back in rooms that the user has newly left, uses the current_state_delta_stream table to pick up changes in membership. However, the current state only is filled out for rooms the server is in, and so doesn't pick up out-of-band leaves (including locally rejected invites).

erikjohnston avatar Mar 26 '25 13:03 erikjohnston

A WIP is here: https://github.com/element-hq/synapse/compare/erikj/ss_reject_invites

A few things left to do (beyond cleaning it up), see # TODO in the code:

  1. Correctly filter out membership "changes" caused by display name changes (which causes JOIN -> JOIN)
  2. Add an index
  3. In get_room_sync_data we need to pull out the membership change

erikjohnston avatar Mar 28 '25 16:03 erikjohnston

I added a commit to that branch that I think accomplishes 1 & 3 from the list above & fixes the linter errors. It also expands the unit test to validate the response actually contains the leave as expected.

Remaining work should be to add the index.

devonh avatar Apr 07 '25 22:04 devonh

I've tested out these changes using element web (using develop.element.io to enable experimental sliding sync)

Element version: 421a79aaa56f-js-f322f32a077f
Crypto version: Rust SDK 0.11.0 (f2e32d4), Vodozemac 0.9.0

I have:

  • tester_a - old sync
  • tester_a - sliding sync
  • tester_b

With the develop branch of Synapse, if tester_b invites tester_a to a room, and tester_a - old sync rejects it, the invite is stuck on tester_a sliding sync (present, and cannot reject due to error: Failed to reject invite MatrixError: [500] Internal server error (http://localhost:8080/_matrix/client/v3/rooms/!jetvclGxXQAszbvsLg%3Alocalhost%3A8481/leave))

With the erikj/ss_reject_invites branch, if tester_b invites tester_a to a room, and tester_a - old sync rejects it, the invite is immediately cleared on tester_a - sliding sync.

devonh avatar Apr 29 '25 16:04 devonh

Running synapse v1.130.0 does not fix this issue. For context, I'm running https://cgit.rory.gay/matrix/tools/MatrixAntiDmSpam.git/about/ to auto reject spam invites. These rejections don't get passed on to Element X iOS (Version: 25.05.1 (166)) correctly. I'm left with empty rooms in my room list that even display the room's thumbnail, which is especially frustrating with the current attacks. Clearing cache does nothing. I have to log out of my account and back in to get the correct list of rooms.

Also trying to leave generates an error with logs like this:

2025-05-22 13:50:50,323 - synapse.http.server - 146 - ERROR - POST-2218- Failed handle request via 'RoomMembershipRestServlet': <XForwardedForRequest at 0x7e7636237bf0 method='POST' uri='/_matrix/client/v3/rooms/!TuAWq9BYI8scZb5bTg:matrix.getsilly.org/leave' clientproto='HTTP/1.1' site='unix'>
Traceback (most recent call last):
  File "/opt/venvs/matrix-synapse/lib/python3.12/site-packages/synapse/http/server.py", line 332, in _async_render_wrapper
    callback_return = await self._async_render(request)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venvs/matrix-synapse/lib/python3.12/site-packages/synapse/http/server.py", line 544, in _async_render
    callback_return = await raw_callback_return
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venvs/matrix-synapse/lib/python3.12/site-packages/synapse/rest/client/room.py", line 1158, in on_POST
    return await self._do(request, requester, room_id, membership_action, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venvs/matrix-synapse/lib/python3.12/site-packages/synapse/rest/client/room.py", line 1131, in _do
    await self.room_member_handler.update_membership(
  File "/opt/venvs/matrix-synapse/lib/python3.12/site-packages/synapse/handlers/room_member.py", line 666, in update_membership
    result = await self.update_membership_locked(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venvs/matrix-synapse/lib/python3.12/site-packages/synapse/handlers/room_member.py", line 1177, in update_membership_locked
    return await self._local_membership_update(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venvs/matrix-synapse/lib/python3.12/site-packages/synapse/handlers/room_member.py", line 472, in _local_membership_update
    ) = await self.event_creation_handler.create_event(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venvs/matrix-synapse/lib/python3.12/site-packages/synapse/handlers/message.py", line 716, in create_event
    event, unpersisted_context = await self.create_new_client_event(
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venvs/matrix-synapse/lib/python3.12/site-packages/synapse/util/metrics.py", line 129, in measured_func
    r = await func(self, *args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venvs/matrix-synapse/lib/python3.12/site-packages/synapse/handlers/message.py", line 1273, in create_new_client_event
    builder.type == EventTypes.Create or prev_event_ids
AssertionError: Attempting to create a non-m.room.create event with no prev_events

nacorid avatar May 23 '25 06:05 nacorid

@nacorid Please create a new issue. It looks like you're running into a separate thing where the leave event isn't being created.

MadLittleMods avatar May 23 '25 15:05 MadLittleMods

@nacorid Please create a new issue. It looks like you're running into a separate thing where the leave event isn't being created.

No, the leave event is created successfully and propagates to clients after clearing the client's cache, except for Element X iOS, the only SSS capable client I use. Logging out of Element X iOS fetches the new room list correctly, without the rooms that I was invited to.

nacorid avatar May 23 '25 16:05 nacorid