synapse icon indicating copy to clipboard operation
synapse copied to clipboard

`state_groups` (& `state_groups_state`) does not appear to be purged alongside the rooms

Open ShadowJonathan opened this issue 2 years ago • 7 comments

While investigating my database size, I found that state_groups_state and state_groups hold many stale entries that reference old rooms that have long since been removed from rooms and all related events tables.

It appears these are not purged alongside other tables, particularly via the DELETE _synapse/admin/v1/rooms/{room_id} API (with purge: true)

Version information

  • Version: Synapse 1.57.1

  • Install method: Docker image

  • Platform: Linux (Debian)

ShadowJonathan avatar May 20 '22 15:05 ShadowJonathan

Context: state_groups_state appears to be my biggest table in the database at this point, and I was really questioning why this is the case;

image

Related question: Can I just delete these rows (that reference rooms that my homeserver isnt a part of, and have been purged long ago) with no effect? Am I supposed to?

ShadowJonathan avatar May 20 '22 15:05 ShadowJonathan

Andrew noticed that there's a note about this in the code... https://github.com/matrix-org/synapse/blob/02d708568b476f2f7716000b35c0adfa4cbd31b3/tests/rest/admin/test_room.py#L2490

reivilibre avatar May 20 '22 15:05 reivilibre

After clearing the tables with non-referenced room names, it seems to have cleared up 4GB of disk space (from 44GB to 40GB, that specific table 12GB -> 8GB)

ShadowJonathan avatar May 20 '22 16:05 ShadowJonathan

We do seem to delete it now though: https://github.com/matrix-org/synapse/blob/19d79b6ebe3070ad7352f24549fbafb9dee44b75/synapse/storage/databases/state/store.py#L667-L676

I wonder if the test would now pass if you check if those are gone?

(Maybe we need a background update or something to clean-up?)

clokep avatar May 20 '22 17:05 clokep

possibly related to https://github.com/matrix-org/synapse/issues/3364#issuecomment-395263702

richvdh avatar May 23 '22 13:05 richvdh

We do seem to delete it now though:

https://github.com/matrix-org/synapse/blob/19d79b6ebe3070ad7352f24549fbafb9dee44b75/synapse/storage/databases/state/store.py#L667-L676

I wonder if the test would now pass if you check if those are gone?

(Maybe we need a background update or something to clean-up?)

I re-ran the tests with the following line uncommented: https://github.com/matrix-org/synapse/blob/02d708568b476f2f7716000b35c0adfa4cbd31b3/tests/rest/admin/test_room.py#L2490

and a bunch of them failed with twisted.trial.unittest.FailTest: 1 != 0 : Rows not purged in state_groups:

  DeleteRoomTestCase
    ...
    test_purge_room_and_block ...                                        [FAIL]
    test_purge_room_and_not_block ...                                    [FAIL]
    ...
    test_shutdown_room_block_peek ...                                    [FAIL]
    test_shutdown_room_consent ...                                       [FAIL]
  DeleteRoomV2TestCase
    ...
    test_purge_room_and_block ...                                        [FAIL]
    test_purge_room_and_not_block ...                                    [FAIL]
    ...
    test_shutdown_room_block_peek ...                                    [FAIL]
    test_shutdown_room_consent ...                                       [FAIL]

So it looks like we're still missing something.

squahtx avatar May 30 '22 15:05 squahtx

It appears that deleting and purging a room can create an orphaned state group when a DELETE _synapse/admin/v1/rooms/{room_id} request arrives for a room where the dag is in a forked condition. This was discovered when Sean and I were debugging why some tests in the DeleteRoomTestCase class were failing on PR I had opened to batch up some of the events in room creation. All of the failing tests were due to an entry for the room being left in the table state_groups_state, which was causing the is_purged assertion in the test to fail. The following scenario illustrates this, and I used actual results from the test purge_room_and_block, setting breakpoints and pulling events/dumping tables to get examples.

Consider the following: a room is created. Three events are sent into the room, one after another, each one referencing the previous one as it's previous event:

graph TD;
      A[m.room.create] --> B[m.room.member]
      B --> C[m.room.power_levels]

After this, several events are created and then batched up to be persisted. In the current scenario, these are a join_rules event and a history_visibility event. These both point to the power_levels event as their previous event. The room creation code has finished, and now the dag looks like this:

graph TD;
      A[m.room.create] --> B[m.room.member]
    B --> C[m.room.power_levels]
    C -->D[m.room.join_rules]
    C -->E[m.room_history_visibility]

At this point, the state_groups_state table looks like this:

[(2,
  '!DIbdgobrpTVoFAztzx:test',
  'm.room.create',
  '',
  '$212ssXuFh__z84_WwehiG_-HlgZxNCe95MDQ3le3VCs'),
 (3,
  '!DIbdgobrpTVoFAztzx:test',
  'm.room.member',
  '@user:test',
  '$o3Lw3gE42sh76kgHN1PNO_782N07D1dBFKEscBWBq3k'),
 (4,
  '!DIbdgobrpTVoFAztzx:test',
  'm.room.power_levels',
  '',
  '$ts5UVqqB33pUiYXTvolsNTaA657en2azS4yeLNC4nh0'),
 (5,
  '!DIbdgobrpTVoFAztzx:test',
  'm.room.join_rules',
  '',
  '$cGox6YgFZJSxz_je8VkMa6YDHEVhLV8h4zse7lCLaAs'),
 (6,
  '!DIbdgobrpTVoFAztzx:test',
  'm.room.history_visibility',
  '',
  '$hhcgaNTJ8zovP17DQESlBx8gurLt_hZp9XekhWawHAs')]

Each event has a separate state group, which makes sense as each event is a state event. Next the delete room code path begins. As part of the shutdown room code, a leave event is created, with two previous events, the current forward extremities. The dag now looks like this:

graph TD;
   A[m.room.create] --> B[m.room.member]
    B --> C[m.room.power_levels]
    C -->D[m.room.join_rules]
    C -->E[m.room_history_visibility]
    E -->F[m.room.member]
    D -->F[m.room.member]

And the state_groups_state table looks like this:

[(2,
  '!DIbdgobrpTVoFAztzx:test',
  'm.room.create',
  '',
  '$212ssXuFh__z84_WwehiG_-HlgZxNCe95MDQ3le3VCs'),
 (3,
  '!DIbdgobrpTVoFAztzx:test',
  'm.room.member',
  '@user:test',
  '$o3Lw3gE42sh76kgHN1PNO_782N07D1dBFKEscBWBq3k'),
 (4,
  '!DIbdgobrpTVoFAztzx:test',
  'm.room.power_levels',
  '',
  '$ts5UVqqB33pUiYXTvolsNTaA657en2azS4yeLNC4nh0'),
 (5,
  '!DIbdgobrpTVoFAztzx:test',
  'm.room.join_rules',
  '',
  '$cGox6YgFZJSxz_je8VkMa6YDHEVhLV8h4zse7lCLaAs'),
 (6,
  '!DIbdgobrpTVoFAztzx:test',
  'm.room.history_visibility',
  '',
  '$hhcgaNTJ8zovP17DQESlBx8gurLt_hZp9XekhWawHAs'),
 (7,
  '!DIbdgobrpTVoFAztzx:test',
  'm.room.history_visibility',
  '',
  '$hhcgaNTJ8zovP17DQESlBx8gurLt_hZp9XekhWawHAs'),
 (8,
  '!DIbdgobrpTVoFAztzx:test',
  'm.room.member',
  '@user:test',
  '$lDvyNohshfEZwqE79ggqkc83sOVZWvdoLukoLDz_xMU')]

You can see that now the same history visibility group event is recorded in two different state groups. Events have two state groups, one before the event and one after the event. When state events are in a linear chain, an event's "before" state group is the same as the previous event's "after" state group, but when the dag is forked, this is no longer the case. In the table above, state group 6 is the room history visibility event's "after" state group, and state group 7 is the leave event's "before" state group. Since the event_to_state_groups table only stores the "after" state groups, when the _purge_room_txn code fetches the state groups to be deleted, it leaves behind the second history_visibility state group. The code in question, synapse/storage/database/main/purge_events lines 354-365,

# First, fetch all the state groups that should be deleted, before
        # we delete that information.
        txn.execute(
            """
                SELECT DISTINCT state_group FROM events
                INNER JOIN event_to_state_groups USING(event_id)
                WHERE events.room_id = ?
            """,
            (room_id,),
        )

        state_groups = [row[0] for row in txn]

results in state_groups resolving to [2, 3, 4, 5, 6, 8]. State group 7 is not selected to be deleted. This list is eventually returned and passed to purge_room_state on line 37 of synapse/storage/controllers/purge_events.py, but since state group 7 is not in the list, it is not deleted and thus becomes an orphan.

H-Shay avatar Aug 12 '22 17:08 H-Shay

After clearing the tables with non-referenced room names, it seems to have cleared up 4GB of disk space (from 44GB to 40GB, that specific table 12GB -> 8GB)

Any opinions on whether this is safe and not corrupting the database even more?

rettichschnidi avatar Aug 18 '22 14:08 rettichschnidi

I think this is basically a duplicate of https://github.com/matrix-org/synapse/issues/3364 for the record.

Any opinions on whether this is safe and not corrupting the database even more?

provided you're sure that all other references to the room have been purged from the database, yes it should be safe to remove the unreferenced rows from state_groups and state_groups_state.

richvdh avatar Aug 18 '22 16:08 richvdh

@richvdh the query you provide in that issue for checking if you are affected or not doesn't give me a count in the thousands (it's around 500). I can see state groups for rooms which have been deleted with the admin api however, and one of them even had 530,636 rows in the state_groups_state table. (I compressed it without realizing it wasn't even meant to be there at first and it went down to 425,268)

tonkku107 avatar Aug 19 '22 17:08 tonkku107

provided you're sure that all other references to the room have been purged from the database, yes it should be safe to remove the unreferenced rows from state_groups and state_groups_state.

How would I determine if all references have been purged?

Asking because there are many tables with a column named room_id:

$ sed -n '/CREATE TABLE/{:start /;/!{N;b start};/room_id/p}' synapse-schema.sql | grep "CREATE TABLE" | cut -c 21- | tr -d \( | grep -v -e state_groups_state -e state_groups
appservice_room_list
batch_events
blocked_rooms
current_state_delta_stream
current_state_events
destination_rooms
device_lists_changes_in_room
e2e_room_keys
event_auth
event_auth_chain_to_calculate
event_backward_extremities
event_edges
event_forward_extremities
event_json
event_labels
event_push_actions
event_push_summary
event_reports
event_search
event_txn_id
events
federation_inbound_events_staging
insertion_event_edges
insertion_event_extremities
insertion_events
local_current_membership
partial_state_events
partial_state_rooms
partial_state_rooms_servers
pusher_throttle
receipts_graph
receipts_linearized
room_account_data
room_aliases
room_depth
room_memberships
room_retention
room_stats_current
room_stats_earliest_token
room_stats_state
room_tags
room_tags_revisions
rooms
state_events
stream_ordering_to_exterm
user_directory
users_in_public_rooms
users_who_share_private_rooms

Would I need to check every single table?

rettichschnidi avatar Aug 30 '22 16:08 rettichschnidi

@rettichschnidi check the events table. If there's nothing there for your room, you can remove the room from state_groups and state_groups_state.

richvdh avatar Aug 30 '22 17:08 richvdh

@richvdh Thanks for the answer. Just did so, reduced the database from 75 to below 35 GB.

Follow up question: When deleting a room, not just state_groups and state_groups_state keep entries, but current_state_delta_stream and device_lists_changes_in_room too. Is it safe to delete those too?

rettichschnidi avatar Aug 30 '22 18:08 rettichschnidi

So, two months later, still with retention enabled, the DB grew again to 76 GB. The room !YTvKGNlinIzlkMTVRl:matrix.org (Element Web/Desktop) had 184082350 (!) rows in state_groups_state, which seems an awful lot. Same goes for !xBRJadUtxNCiIKdwZT:matrix.org (Cybersecurity-General, 17632406 rows) and !HiMrQwoxwdnWgrTVly:matrix.org (FOSDEM, 13416609 rows). Are those rooms really THAT active/big?

In order to shrink my DB once gain, I did the following:

  1. Removed those rooms, e.g.:
curl 'http://localhost:8008/_synapse/admin/v1/rooms/!YTvKGNlinIzlkMTVRl:matrix.org' -X DELETE -H 'Accept: application/json' -H 'Referer: http://localhost:8080/' -H 'authorization: Bearer $TOKEN' --data '{ "purge": true, "message": "Sorry - kicking you out to clean up the database" }'
  1. Shut down Synapse and cleaned up the database like this:
DELETE FROM 
  state_groups_state 
WHERE 
  room_id IN (
    SELECT 
      DISTINCT(state_groups.room_id) AS room_id_gone 
    FROM 
      state_groups 
      LEFT JOIN events USING(room_id) 
    WHERE 
      events.room_id IS NULL 
    GROUP BY 
      room_id_gone
  );

I joined using state_groups instead of state_groups_state for performance reasons before cleaning it up as well:

DELETE FROM 
  state_groups 
WHERE 
  room_id IN (
    SELECT 
      DISTINCT(state_groups.room_id) AS room_id_gone 
    FROM 
      state_groups 
      LEFT JOIN events USING(room_id) 
    WHERE 
      events.room_id IS NULL 
    GROUP BY 
      room_id_gone
  );
  1. Reclaimed the space on the file system:
REINDEX (VERBOSE) DATABASE synapse; VACUUM FULL VERBOSE;

As a result of this, my DB shrunk to 16 GB (on the file system).

  1. Restarted Synapse

Open question: Some other tables also return small to big numbers, an I am wondering if it is save to delete the found entries?

synapse=# SELECT FROM device_lists_changes_in_room WHERE room_id IN (SELECT DISTINCT(device_lists_changes_in_room.room_id) AS room_id_gone FROM device_lists_changes_in_room LEFT JOIN events USING(room_id) WHERE events.room_id IS NULL GROUP BY room_id_gone);
--
(1230238 rows)

The other tables being:

  • current_state_delta_stream: 3405301 rows
  • e2e_room_keys: 71
  • event_reports: 1 row
  • receipts_graph: 2742 rows
  • receipts_linearized: 2742 rows
  • room_account_data: 4 rows
  • room_retention: 4 rows
  • room_tags_revisions: 61 rows

rettichschnidi avatar Oct 29 '22 08:10 rettichschnidi

@rettichschnidi Are there any problems since your cleanup? We probably have the same problem and would also like to get rid of the deleted rooms. I'm just curious if there were any other problems after that?

Dan-Sun avatar Nov 01 '22 00:11 Dan-Sun

Are there any problems since your cleanup? We probably have the same problem and would also like to get rid of the deleted rooms. I'm just curious if there were any other problems after that?

I have not experienced any myself and also have not gotten any complains yet. Will update here if this changes.

rettichschnidi avatar Nov 01 '22 06:11 rettichschnidi

Are there any problems since your cleanup? We probably have the same problem and would also like to get rid of the deleted rooms. I'm just curious if there were any other problems after that?

I have not experienced any myself and also have not gotten any complains yet. Will update here if this changes.

Thank you for testing this. We did that too and so far so good. We even went from about 115 GB down to 13 GB.

Dan-Sun avatar Nov 04 '22 02:11 Dan-Sun

Very nice @rettichschnidi! This reduced the database size from 10 to 4 GB.

marvinwankersteen avatar Dec 11 '22 19:12 marvinwankersteen

There are also references of room_id in table current_state_delta_stream which held the reference stream_id to table cache_invalidation_stream_by_instance.

cache_invalidation_stream_by_instance has a column cache_func. Any ideas how this is been used and whats the correct way?

In addition to @rettichschnidi i made the following draft:

DELETE FROM
  cache_invalidation_stream_by_instance as inv
  INNER JOIN current_state_delta_stream as stream ON inv.stream_id = stream.stream_id
WHERE room_id IN (
    SELECT 
      DISTINCT(state_groups.room_id) AS room_id_gone 
    FROM 
      state_groups 
      LEFT JOIN events USING(room_id) 
    WHERE 
      events.room_id IS NULL 
    GROUP BY 
      room_id_gone
  )
AND cache_func = 'have_seen_event';

boontifex avatar Feb 01 '23 16:02 boontifex

There are also references of room_id in table current_state_delta_stream which held the reference stream_id to table cache_invalidation_stream_by_instance.

I am far from convinced those columns are referring to the same "stream", though I haven't actually checked. Both of those tables can be periodically cleaned out anyway; see #5888 and #13456.

richvdh avatar Feb 02 '23 08:02 richvdh

@richvdh I didn't really find the comprehensive documentation how synapse stores data and why, but i've give some thoughts.

Events (joining, leaving, messages etc) are stored in the tables events and events_json.

IMO every entry in the following tables, that doesn't have a related event_id in events is orphaned and can be deleted. Or am I wrong?

  • cache_invalidation_stream_by_instance
  • current_state_delta_stream
  • device_lists_changes_in_room
  • state_groups
  • state_groups_state
  • cache_invalidation_stream_by_instance

boontifex avatar Feb 07 '23 07:02 boontifex