synapse icon indicating copy to clipboard operation
synapse copied to clipboard

Forward extremity event missing from database, leading to `No state group for unknown or outlier event`

Open esackbauer opened this issue 2 years ago • 31 comments

Description

After upgrading to Synapse v1.57.0, one direct room is not working anymore. Cannot post new messages from any client or member. Cannot roll back history more than the screen, then its stuck. Cannot leave room "Internal Server Error 500"

Steps to reproduce

Tried rebooting the server, even tried to remove the room with Synapse-Admin - still get an error, room cannot be deleted.

Version information

  • Homeserver:

If not matrix.org: matrix.flyar.net

  • Version: {"server_version":"1.57.0","python_version":"3.9.2"}

  • Install method:

Debian repo

  • Platform:

Debian 11.3 VM x64

This is how the logfile looks like:

2022-04-20 10:03:34,130 - synapse.http.server - 100 - ERROR - GET-0- Failed handle request via 'RoomMessageListRestServlet': <XForwardedForRequest at 0x7f28d07d3730 method>
Traceback (most recent call last):
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
    result = current_context.run(gen.send, result)
StopIteration: [{'event_id': '$1650209062194EQOlf:matrix.flyar.net', 'state_group': 1277497}]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
    result = current_context.run(gen.send, result)
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/databases/main/state.py", line 331, in _get_state_group_for_events
    raise RuntimeError("No state group for unknown or outlier event %s" % e)
RuntimeError: No state group for unknown or outlier event $1638282339138PTbRx:matrix.flyar.net
2022-04-20 10:03:34,508 - synapse.http.server - 100 - ERROR - GET-1- Failed handle request via 'RoomMessageListRestServlet': <XForwardedForRequest at 0x7f28d07e3be0 method>
Traceback (most recent call last):
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
    result = current_context.run(gen.send, result)
StopIteration: [{'event_id': '$1650209062194EQOlf:matrix.flyar.net', 'state_group': 1277497}]

The part after "During handling" is repeated. It seems one event is blocking everything? How can I get rid of this?

esackbauer avatar Apr 20 '22 11:04 esackbauer

Error message is from #12191. @richvdh, one for you?

DMRobertson avatar Apr 20 '22 11:04 DMRobertson

Possibly, but there's not really enough info in those logs to help. @esackbauer: is that the complete log? there is nothing logged at INFO?

richvdh avatar Apr 20 '22 11:04 richvdh

[edited by @richvdh to fix formatting]

There is actually nothing more. the room which is not working is !DJCGiBKsgowqkkhkfS:matrix.flyar.net

2022-04-20 14:14:25,352 - synapse.access.http.8008 - 427 - INFO - GET-777- 10.0.0.254 - 8008 - {@xxxxxxx:matrix.flyar.net} Processed request: 7.106sec/0.001sec (0.002sec, 0.000sec) (0.001sec/0.003sec/2) 427B 200 "GET /_matrix/client/r0/sync?filter=16&timeout=30000&since=s1777543_19487621_0_763076_55581_74_3576_68313_1 HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.10.10 Chrome/98.0.4758.74 Electron/17.0.0 Safari/537.36" [0 dbevts]
2022-04-20 14:14:25,353 - synapse.access.http.8008 - 427 - INFO - PUT-782- 10.0.0.254 - 8008 - {derechosdigitales.org} Processed request: 0.028sec/0.000sec (0.004sec, 0.000sec) (0.002sec/0.018sec/8) 11B 200 "PUT /_matrix/federation/v1/send/1649264576283 HTTP/1.0" "Synapse/1.54.0" [0 dbevts]
2022-04-20 14:14:25,857 - synapse.http.server - 100 - ERROR - GET-786- Failed handle request via 'RoomMessageListRestServlet': <XForwardedForRequest at 0x7f5ca035c880 method='GET' uri='/_matrix/client/r0/rooms/!DJCGiBKsgowqkkhkfS:matrix.flyar.net/messages?limit=20&dir=b&from=t26905-1777420_19484474_34_762945_55494_74_3575_68099_1&filter=%7B%22lazy_load_members%22%3Atrue%7D' clientproto='HTTP/1.0' site='8008'>
Traceback (most recent call last):
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
    result = current_context.run(gen.send, result)
StopIteration: [{'event_id': '$1650209062194EQOlf:matrix.flyar.net', 'state_group': 1277497}]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
    result = current_context.run(gen.send, result)
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/databases/main/state.py", line 331, in _get_state_group_for_events
    raise RuntimeError("No state group for unknown or outlier event %s" % e)
RuntimeError: No state group for unknown or outlier event $1638282339138PTbRx:matrix.flyar.net
2022-04-20 14:14:25,857 - synapse.access.http.8008 - 427 - INFO - GET-786- 10.0.0.254 - 8008 - {@xxxxxxx:matrix.flyar.net} Processed request: 0.003sec/0.000sec (0.000sec, 0.000sec) (0.000sec/0.001sec/1) 55B 500 "GET /_matrix/client/r0/rooms/!DJCGiBKsgowqkkhkfS:matrix.flyar.net/messages?limit=20&dir=b&from=t26905-1777420_19484474_34_762945_55494_74_3575_68099_1&filter=%7B%22lazy_load_members%22%3Atrue%7D HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.10.10 Chrome/98.0.4758.74 Electron/17.0.0 Safari/537.36" [0 dbevts]
2022-04-20 14:14:28,904 - synapse.federation.transport.server.federation - 102 - INFO - PUT-787- Received txn 1648140153658 from theyosh.nl. (PDUs: 0, EDUs: 1)
2022-04-20 14:14:28,924 - synapse.access.http.8008 - 427 - INFO - GET-784- 10.0.0.254 - 8008 - {@xxxxxxx:matrix.flyar.net} Processed request: 3.538sec/0.001sec (0.001sec, 0.000sec) (0.001sec/0.003sec/2) 386B 200 "GET /_matrix/client/r0/sync?filter=16&timeout=30000&since=s1777543_19487622_0_763076_55581_74_3576_68313_1 HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.10.10 Chrome/98.0.4758.74 Electron/17.0.0 Safari/537.36" [0 dbevts]
2022-04-20 14:14:28,927 - synapse.access.http.8008 - 427 - INFO - PUT-787- 10.0.0.254 - 8008 - {theyosh.nl} Processed request: 0.023sec/0.000sec (0.001sec, 0.002sec) (0.002sec/0.016sec/6) 11B 200 "PUT /_matrix/federation/v1/send/1648140153658 HTTP/1.0" "Synapse/1.49.2" [0 dbevts]
2022-04-20 14:14:29,072 - synapse.metrics._gc - 120 - INFO - sentinel- Collecting gc 1
2022-04-20 14:14:29,747 - synapse.federation.transport.server.federation - 102 - INFO - PUT-790- Received txn 1647625979627 from jaccu.se. (PDUs: 0, EDUs: 1)
2022-04-20 14:14:29,753 - synapse.access.http.8008 - 427 - INFO - PUT-790- 10.0.0.254 - 8008 - {jaccu.se} Processed request: 0.010sec/0.000sec (0.001sec, 0.000sec) (0.001sec/0.005sec/3) 11B 200 "PUT /_matrix/federation/v1/send/1647625979627 HTTP/1.0" "Synapse/1.54.0" [0 dbevts]
2022-04-20 14:14:29,856 - synapse.http.server - 100 - ERROR - GET-791- Failed handle request via 'RoomMessageListRestServlet': <XForwardedForRequest at 0x7f5ca025ff40 method='GET' uri='/_matrix/client/r0/rooms/!DJCGiBKsgowqkkhkfS:matrix.flyar.net/messages?limit=20&dir=b&from=t26905-1777420_19484474_34_762945_55494_74_3575_68099_1&filter=%7B%22lazy_load_members%22%3Atrue%7D' clientproto='HTTP/1.0' site='8008'>
Traceback (most recent call last):
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
    result = current_context.run(gen.send, result)
StopIteration: [{'event_id': '$1650209062194EQOlf:matrix.flyar.net', 'state_group': 1277497}]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
    result = current_context.run(gen.send, result)
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/databases/main/state.py", line 331, in _get_state_group_for_events
    raise RuntimeError("No state group for unknown or outlier event %s" % e)
RuntimeError: No state group for unknown or outlier event $1638282339138PTbRx:matrix.flyar.net
2022-04-20 14:14:29,857 - synapse.access.http.8008 - 427 - INFO - GET-791- 10.0.0.254 - 8008 - {@xxxxxxx:matrix.flyar.net} Processed request: 0.003sec/0.000sec (0.002sec, 0.000sec) (0.000sec/0.001sec/1) 55B 500 "GET /_matrix/client/r0/rooms/!DJCGiBKsgowqkkhkfS:matrix.flyar.net/messages?limit=20&dir=b&from=t26905-1777420_19484474_34_762945_55494_74_3575_68099_1&filter=%7B%22lazy_load_members%22%3Atrue%7D HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.10.10 Chrome/98.0.4758.74 Electron/17.0.0 Safari/537.36" [0 dbevts]
2022-04-20 14:14:30,002 - synapse.federation.transport.server.federation - 102 - INFO - PUT-792- Received txn 1648259608250 from chat.tux.land. (PDUs: 0, EDUs: 1)
2022-04-20 14:14:30,009 - synapse.access.http.8008 - 427 - INFO - PUT-792- 10.0.0.254 - 8008 - {chat.tux.land} Processed request: 0.014sec/0.000sec (0.001sec, 0.000sec) (0.001sec/0.007sec/3) 11B 200 "PUT /_matrix/federation/v1/send/1648259608250 HTTP/1.0" "Synapse/1.55.0" [0 dbevts]

esackbauer avatar Apr 20 '22 12:04 esackbauer

[please wrap your logs in triple-backticks (```) for legibility]

well, that's frustrating. Could you maybe enable DEBUG logging (change the level to DEBUG in your log config file)?

richvdh avatar Apr 20 '22 12:04 richvdh

Ok, sorry. The debug output is rather large, I did a clean start of synapse and logged everything after I tried to post something in that room.

Because its so large, I attached the homeserver.log

esackbauer avatar Apr 20 '22 13:04 esackbauer

Right, well, that sheds a bit of light on the situation, but not much.

I'd like to ask you to run some queries on your database - could you DM me at @richvdh:sw1v.org ?

richvdh avatar Apr 20 '22 14:04 richvdh

DM started

esackbauer avatar Apr 20 '22 14:04 esackbauer

I think you've got a problem with your configuration:

Invalid signature for server sw1v.org with key ed25519:a_XfsG: Unable to verify signature for sw1v.org: <class 'nacl.exceptions.BadSignatureError'> Signature was forged or corrupt

Typically this is caused by a misconfigured reverse-proxy - see https://matrix-org.github.io/synapse/latest/reverse_proxy.html (and https://github.com/matrix-org/synapse/issues/3294)

richvdh avatar Apr 20 '22 14:04 richvdh

I am using Sophos Firewall as reverse proxy. It provides x-forwarded headers, homeserver.yaml also contains x_forwarded: true, dunno why this isnt working suddenly. To rule this out I have now the federation running on different port with DNAT rule. You should have a new invitation now.

esackbauer avatar Apr 20 '22 15:04 esackbauer

It seems that a particular event ($1638282339138PTbRx:matrix.flyar.net) is marked as a forward extremity, but that event is not present in the database.

Attempts to delete the room are failing, because the delete operation tries to generate leave events for each of the members of the room - which requires the forward extremities to exist.

The event id suggests that the event was created in November 2021, so this problem has probably been waiting undetected since then, and it only became a real problem when Synapse 1.57 added the extra validation.

I'm at a bit of a loss to explain how this could have happened. There's essentially no way for an entry to be added to event_forward_extremites without there also being an entry in events. It could have been related to https://github.com/matrix-org/synapse/issues/11521, but since this room isn't federated, I don't think it can be.

richvdh avatar Apr 20 '22 16:04 richvdh

Hi

I just want to report that I have the same issue and if you need some more sample data I'm happy to help with providing information.

Kidswiss avatar Apr 20 '22 18:04 Kidswiss

@Kidswiss thanks! What is the event id of the missing forward extremity? Do you know when it got added, and do you have logs going back that far?

richvdh avatar Apr 21 '22 06:04 richvdh

@richvdh how can I determine the date it was added? I found it in the event_forward_extremities table. But it doesn't exist in the events table. Is that supposed to be the case or is there already something broken?

Kidswiss avatar Apr 21 '22 07:04 Kidswiss

I found it in the event_forward_extremities table. But it doesn't exist in the events table. Is that supposed to be the case or is there already something broken?

well, that's good in that it confirms it is the same problem as @esackbauer. But no, it should not be the case.

how can I determine the date it was added?

Sadly that is hard to do. If it is a federated room, we might be able to check when other servers saw the event. If it is a V1 room, the event id will give us a clue. Otherwise... :(

Other questions which might help eliminate potential causes:

  • Is this a federated room (ie, are there now, or have there ever been, any users on other servers in this room)?
  • do you have retention enabled on your server?
  • Have you used the delete room API or the purge history API on this room, either directly or via some control panel or "maintenance" script?
  • Have you done any database maintenance operations which could somehow have introduced this (eg: backup/restore; port from sqlite)?

richvdh avatar Apr 21 '22 08:04 richvdh

For reference, here is a query which will confirm if your system has been affected by this bug:

select * from event_forward_extremities efe left join events e using (event_id) where e.event_id is null;

... if that returns any rows, you have this issue. (If it doesn't, you likely have a different problem and should open a separate issue)

richvdh avatar Apr 21 '22 08:04 richvdh

@richvdh

  • nope, it's not federated, it's a local only room and always has been the room is v5
  • no retention enabled
  • also haven't used any purging or scripts of this sort. However, due to a storage issue and faulty backups, I lost all attachments sometime last year. But for the whole instance, not just that room. Maybe that's relevant.
  • I've dumped and restored the postgresql database a few weeks back due to a migration. As far as I can remember, that went smoothly.

I've since downgraded to 1.56 again, as that's a fairly important room for me.

EDIT: running your sql query, I actually have two rooms that suffer from this.

Kidswiss avatar Apr 21 '22 09:04 Kidswiss

I lost all attachments sometime last year.

attachments as in uploaded media? shouldn't be a factor.

I've dumped and restored the postgresql database a few weeks back due to a migration

It's conceivable this could have introduced a problem by not restoring all rows to the events database. Seems unlikely though.

richvdh avatar Apr 21 '22 11:04 richvdh

To work around the problem, it should be safe to remove the bad rows from event_forward_extremities with:

DELETE FROM event_forward_extremities WHERE event_id='$....';

... and then restart synapse.

(Note: don't forget the WHERE clause! Only do this for rows returned by the select query above)

It doesn't get us any closer to figuring out the cause, though.

richvdh avatar Apr 21 '22 11:04 richvdh

I took a dump, dropped the broken entries and updated to 1.57.1 and it looks good so far. Thanks!

But I agree it's a bit weird that this happened in the first place. But I won't rule out any screw-ups from my side.

Kidswiss avatar Apr 21 '22 13:04 Kidswiss

Encountering a similar issue as well right after updating to 1.57.0, with these kinds of logs:

synapse                           | 2022-04-21 17:02:28,383 - synapse.http.server - 100 - ERROR - GET-1273- Failed handle request via 'SyncRestServlet': <XForwardedForRequest at 0x7fbb5103e100 method='GET' uri='/_matrix/client/r0/sync?fil
ter=5&timeout=0&_cacheBuster=1650553335653' clientproto='HTTP/1.1' site='8008'>                                                                                                                                                              
synapse                           | Traceback (most recent call last):                                                 
synapse                           |   File "/usr/local/lib/python3.9/site-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks                                                                                                
synapse                           |     result = current_context.run(                 
synapse                           |   File "/usr/local/lib/python3.9/site-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator                                                                                      
synapse                           |     return g.throw(self.type, self.value, self.tb)                          
synapse                           |   File "/usr/local/lib/python3.9/site-packages/synapse/util/caches/response_cache.py", line 246, in cb                                                                                                   
synapse                           |     return await callback(*args, **kwargs)                                                                                                                                                               
synapse                           |   File "/usr/local/lib/python3.9/site-packages/synapse/handlers/sync.py", line 355, in _wait_for_sync_for_user                                                                                           
synapse                           |     result: SyncResult = await self.current_sync_for_user(                                                                                                                                                
synapse                           |   File "/usr/local/lib/python3.9/site-packages/synapse/handlers/sync.py", line 406, in current_sync_for_user                                                                                             
synapse                           |     sync_result = await self.generate_sync_result(                                                                                                                                                        
synapse                           |   File "/usr/local/lib/python3.9/site-packages/synapse/handlers/sync.py", line 1114, in generate_sync_result                                                                                             
synapse                           |     res = await self._generate_sync_entry_for_rooms(
synapse                           |   File "/usr/local/lib/python3.9/site-packages/synapse/handlers/sync.py", line 1625, in _generate_sync_entry_for_rooms                                                                                   
synapse                           |     await concurrently_execute(handle_room_entries, room_entries, 10)
synapse                           |   File "/usr/local/lib/python3.9/site-packages/synapse/util/async_helpers.py", line 233, in concurrently_execute                                                                                         
synapse                           |     await yieldable_gather_results(               
synapse                           |   File "/usr/local/lib/python3.9/site-packages/synapse/util/async_helpers.py", line 274, in yieldable_gather_results                                                                                     
synapse                           |     raise dfe.subFailure.value from None  
synapse                           |   File "/usr/local/lib/python3.9/site-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks                                                                                                
synapse                           |     result = current_context.run(                         
synapse                           |   File "/usr/local/lib/python3.9/site-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator                                                                                      
synapse                           |     return g.throw(self.type, self.value, self.tb)
synapse                           |   File "/usr/local/lib/python3.9/site-packages/synapse/util/async_helpers.py", line 225, in _concurrently_execute_inner                                                                                  
synapse                           |     await maybe_awaitable(func(value))              
synapse                           |   File "/usr/local/lib/python3.9/site-packages/synapse/handlers/sync.py", line 1614, in handle_room_entries                                                                                              
synapse                           |     await self._generate_room_entry(                                 
synapse                           |   File "/usr/local/lib/python3.9/site-packages/synapse/handlers/sync.py", line 2060, in _generate_room_entry                                                                                             
synapse                           |     batch = await self._load_filtered_recents(
synapse                           |   File "/usr/local/lib/python3.9/site-packages/synapse/handlers/sync.py", line 592, in _load_filtered_recents                                                                                            
synapse                           |     loaded_recents = await filter_events_for_client(
synapse                           |   File "/usr/local/lib/python3.9/site-packages/synapse/visibility.py", line 85, in filter_events_for_client                                                                                              
synapse                           |     event_id_to_state = await storage.state.get_state_for_events(
synapse                           |   File "/usr/local/lib/python3.9/site-packages/synapse/storage/state.py", line 671, in get_state_for_events                                                                                              
synapse                           |     event_to_groups = await self.stores.main._get_state_group_for_events(event_ids)
synapse                           |   File "/usr/local/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks                                                                                                
synapse                           |     result = current_context.run(gen.send, result)
synapse                           |   File "/usr/local/lib/python3.9/site-packages/synapse/storage/databases/main/state.py", line 331, in _get_state_group_for_events                                                                        
synapse                           |     raise RuntimeError("No state group for unknown or outlier event %s" % e)
synapse                           | RuntimeError: No state group for unknown or outlier event XXXXXXXX                                                  

(I replaced the actual event id by XXXXXXXX)

This happens when trying an initial sync when logging in with a fresh element-web instance.

However, running the query from https://github.com/matrix-org/synapse/issues/12507#issuecomment-1104903000 did not return anything, so I suppose it's different?

UPDATE: the problem disappeared after rolling back to 1.54.0, I can log in and sync once again. This problem was thus created by a change appearing between v1.55.0 and v1.57.0.

Ezwen avatar Apr 21 '22 14:04 Ezwen

I seemingly have the same issue. I had 2 entries in the select * from event_forward_extremities efe left join events e using (event_id) where e.event_id is null; query both of which have been deleted, the problem persists after deleting the rows.

My instance of matrix/synapse has been floating around for 4 or 5 years probably and updated regularly along the way (avhost container). If any other info is helpful just let me know what I can provide.

EDIT: after some time the issue cleared itself up (after deleting the db entries). Not sure if some other gc process eventually happened that fully cleared it up or what.

travisghansen avatar Apr 21 '22 21:04 travisghansen

Just wanted to note that a user just reported this for a matrix.org room on my homeserver...

verymilan avatar Apr 23 '22 06:04 verymilan

got this error too. with both 157 and 1571 v1.56.0 works

MparkG avatar Apr 23 '22 08:04 MparkG

@travisghansen:

EDIT: after some time the issue cleared itself up (after deleting the db entries).

Likely Synapse had cached the deleted rows. I've updated the instructions to suggest restarting it.

@Ezwen:

However, running the query from https://github.com/matrix-org/synapse/issues/12507#issuecomment-1104903000 did not return anything, so I suppose it's different?

Please don't report different issues here.

richvdh avatar Apr 25 '22 18:04 richvdh

People encountering this issue as confirmed by the query at https://github.com/matrix-org/synapse/issues/12507#issuecomment-1104903000:

First: note that this error is reporting corruption in your database which may have happened a long time ago. The only recent change is that Synapse now checks for the corruption.

Second, please confirm:

  • Is this affecting a federated room (ie, are there now, or have there ever been, any users on other servers in this room)?
  • do you have retention enabled on your server?
  • Have you used the delete room API or the purge history API on this room, either directly or via some control panel or "maintenance" script?
  • Have you done any database maintenance operations which could somehow have introduced this (eg: backup/restore; port from sqlite)?

richvdh avatar Apr 25 '22 18:04 richvdh

the query returns no rows when ran on the db. i have federation on in general, i couldnt find the event id in question in the db. no idea what room the error came from. retention has been on for the last weeks, i have now turned it off, because i suspect it to be the origin. however the error remains after restarting synapse. db maintenance is pg_dump, compress_states_large_rooms, vaccum. its the initial sync that is broken for me. for at least two users. this is the apache (proxy) log line:

"GET /_matrix/client/r0/sync?filter=5&timeout=0&_cacheBuster=1650918641901 HTTP/2.0" 500 719 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:99.0) Gecko/20100101 Firefox/99.0"

synapse error:

2022-04-25 22:33:25,682 - synapse.access.http.8008 - 427 - INFO - GET-2- 10.0.0.223 - 8008 - {@asd:masd.asd} Processed request: 0.002sec/0.001sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 55B 500 "GET /_matrix/client/r0/sync?filter=5&timeout=0&_cacheBuster=1650918805656 HTTP/1.1" "Mozilla/5.0 (X11; Linux x86_64; rv:99.0) Gecko/20100101 Firefox/99.0" [0 dbevts]
2022-04-25 22:33:32,459 - synapse.util.caches.response_cache - 254 - INFO - GET-3- [sync]: using completed cached result for [(@asd:masd.asd, 0, None, '5', False, 'noideawhatthisis')]
2022-04-25 22:33:32,459 - synapse.http.server - 100 - ERROR - GET-3- Failed handle request via 'SyncRestServlet': <XForwardedForRequest at 0x7f5f145ad8b0 method='GET' uri='/_matrix/client/r0/sync?filter=5&timeout=0&_cacheBuster=1650918812445' clientproto='HTTP/1.1' site='8008'>
Traceback (most recent call last):
  File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/http/server.py", line 268, in _async_render_wrapper
    callback_return = await self._async_render(request)
  File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/http/server.py", line 470, in _async_render
    callback_return = await raw_callback_return
  File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/rest/client/sync.py", line 192, in on_GET
    sync_result = await self.sync_handler.wait_for_sync_for_user(
  File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/handlers/sync.py", line 297, in wait_for_sync_for_user
    res = await self.response_cache.wrap(
  File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/util/caches/response_cache.py", line 265, in wrap
    return await make_deferred_yieldable(result)
  File "/home/server/.synapse/env/lib/python3.8/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/server/.synapse/env/lib/python3.8/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/util/caches/response_cache.py", line 246, in cb
    return await callback(*args, **kwargs)
  File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/handlers/sync.py", line 355, in _wait_for_sync_for_user
    result: SyncResult = await self.current_sync_for_user(
  File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/handlers/sync.py", line 406, in current_sync_for_user
    sync_result = await self.generate_sync_result(
  File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/handlers/sync.py", line 1114, in generate_sync_result
    res = await self._generate_sync_entry_for_rooms(
  File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/handlers/sync.py", line 1625, in _generate_sync_entry_for_rooms
    await concurrently_execute(handle_room_entries, room_entries, 10)
  File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/util/async_helpers.py", line 233, in concurrently_execute
    await yieldable_gather_results(
  File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/util/async_helpers.py", line 274, in yieldable_gather_results
    raise dfe.subFailure.value from None
  File "/home/server/.synapse/env/lib/python3.8/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/server/.synapse/env/lib/python3.8/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/util/async_helpers.py", line 225, in _concurrently_execute_inner
    await maybe_awaitable(func(value))
  File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/handlers/sync.py", line 1614, in handle_room_entries
    await self._generate_room_entry(
  File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/handlers/sync.py", line 2060, in _generate_room_entry
    batch = await self._load_filtered_recents(
  File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/handlers/sync.py", line 592, in _load_filtered_recents
    loaded_recents = await filter_events_for_client(
  File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/visibility.py", line 85, in filter_events_for_client
    event_id_to_state = await storage.state.get_state_for_events(
  File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/storage/state.py", line 671, in get_state_for_events
    event_to_groups = await self.stores.main._get_state_group_for_events(event_ids)
  File "/home/server/.synapse/env/lib/python3.8/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/storage/databases/main/state.py", line 331, in _get_state_group_for_events
    raise RuntimeError("No state group for unknown or outlier event %s" % e)
RuntimeError: No state group for unknown or outlier event $4C1ulcbO5gZeV2t0sc5GASD2G09VvFm0bTGWIZDBlIw

thanks for the help.

MparkG avatar Apr 25 '22 20:04 MparkG

  • Is this affecting a federated room (ie, are there now, or have there ever been, any users on other servers in this room)?

My install is fully isolated currently.

  • do you have retention enabled on your server?

I've never enabled that, so I'm guessing the default is no. Glad to now know it exists however! That's partly why I wasn't federating anything as I knew the data would grow massively on public rooms etc :)

No.

  • Have you done any database maintenance operations which could somehow have introduced this (eg: backup/restore; port from sqlite)?

No.

travisghansen avatar Apr 25 '22 21:04 travisghansen

@MparkG: as the earlier comments make very clear: "If it doesn't [return any rows] you likely have a different problem and should open a separate issue".

richvdh avatar Apr 26 '22 10:04 richvdh

@travisghansen thanks for the answers, though honestly they don't help me get any closer to understanding what can have caused this.

Be wary of retention: it's not a well-used feature and the reason I asked about it is that it is likely to cause corruption bugs like this.

richvdh avatar Apr 26 '22 10:04 richvdh

We're seeing exceptions like these as well and those rooms can't be written to. This is a room on the hackint IRC bridge, so lots of federated users, no usage of delete room or purge history API endpoints and we are using rust-synapse-compress-state regularly.

We're on Synapse 1.58.01.59.1 right now.

Feel free to contact me at @hexa:lossy.network and we can debug this together.

Traceback
 synapse.http.server: [PUT-672070] Failed handle request via 'RoomSendEventRestServlet': <XForwardedForRequest at 0x7fb87ab32a30 method='PUT' uri='/_matrix/client/r0/rooms/!lxUgQdCOpWnOrYNVxb%3Ahax404.de/send/m.room.message/1653997592776__inc2?user_id=%40hexa-%3Ahackint.org' clientproto='HTTP/1.1' site='8008'>
 Traceback (most recent call last):
   File "/nix/store/ks8r355nmkcx9q23s4m3nm1y767rfrln-python3.9-Twisted-22.4.0/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
     result = current_context.run(gen.send, result)
 StopIteration: [{'event_id': '$Ps2Ot0YLz20kbjS8lRFmK_Yo4t2fXPAvq3IU0CQTMoU', 'state_group': 820535}]
 
 During handling of the above exception, another exception occurred:
 
 Traceback (most recent call last):
   File "/nix/store/ks8r355nmkcx9q23s4m3nm1y767rfrln-python3.9-Twisted-22.4.0/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
     result = current_context.run(gen.send, result)
   File "/nix/store/lan1pgij0701dzi9nsvvy6nn62qk93b5-matrix-synapse-1.58.0/lib/python3.9/site-packages/synapse/storage/databases/main/state.py", line 332, in _get_state_group_for_events
     raise RuntimeError("No state group for unknown or outlier event %s" % e)
 RuntimeError: No state group for unknown or outlier event $FXKIdG6LkVKVsOEROx4IBer-eqPsQ-HCWdWpRuqyRFk
Retention

We are indeed using a pretty aggressive retention rule, since we only want to relay the bridged content, but not store it persistenly.

retention:
  allowed_lifetime_max: 1m
  allowed_lifetime_min: 1m
  default_policy:
    max_lifetime: 1m
    min_lifetime: 1m
  enabled: true
  purge_jobs:
  - interval: 30m
Query result

select * from event_forward_extremities efe left join events e using (event_id) where e.event_id is null; (7 rows)

Not the event id mentioned in the traceback above, but I guess that's still worth debugging/fixing.

No corruption log lines

We don't see any corruption log events from #12620

Removed events according to https://github.com/matrix-org/synapse/issues/12507#issuecomment-1105061360
matrix-synapse=# SELECT count(event_id) FROM event_forward_extremities;
count 
-------
2660
(1 row)
matrix-synapse=# BEGIN;
BEGIN
matrix-synapse=# DELETE FROM event_forward_extremities WHERE event_id IN (select event_id from event_forward_extremities efe left join events e using (event_id) where e.event_id is null);
DELETE 7
matrix-synapse=# SELECT count(event_id) FROM event_forward_extremities;
count 
-------
2655
(1 row)

matrix-synapse=# END;
COMMIT

2660 - 7 = 2655 … okay. :thinking:

mweinelt avatar May 31 '22 12:05 mweinelt