synapse
synapse copied to clipboard
Forward extremity event missing from database, leading to `No state group for unknown or outlier event`
Description
After upgrading to Synapse v1.57.0, one direct room is not working anymore. Cannot post new messages from any client or member. Cannot roll back history more than the screen, then its stuck. Cannot leave room "Internal Server Error 500"
Steps to reproduce
Tried rebooting the server, even tried to remove the room with Synapse-Admin - still get an error, room cannot be deleted.
Version information
- Homeserver:
If not matrix.org: matrix.flyar.net
-
Version: {"server_version":"1.57.0","python_version":"3.9.2"}
-
Install method:
Debian repo
- Platform:
Debian 11.3 VM x64
This is how the logfile looks like:
2022-04-20 10:03:34,130 - synapse.http.server - 100 - ERROR - GET-0- Failed handle request via 'RoomMessageListRestServlet': <XForwardedForRequest at 0x7f28d07d3730 method>
Traceback (most recent call last):
File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
result = current_context.run(gen.send, result)
StopIteration: [{'event_id': '$1650209062194EQOlf:matrix.flyar.net', 'state_group': 1277497}]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
result = current_context.run(gen.send, result)
File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/databases/main/state.py", line 331, in _get_state_group_for_events
raise RuntimeError("No state group for unknown or outlier event %s" % e)
RuntimeError: No state group for unknown or outlier event $1638282339138PTbRx:matrix.flyar.net
2022-04-20 10:03:34,508 - synapse.http.server - 100 - ERROR - GET-1- Failed handle request via 'RoomMessageListRestServlet': <XForwardedForRequest at 0x7f28d07e3be0 method>
Traceback (most recent call last):
File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
result = current_context.run(gen.send, result)
StopIteration: [{'event_id': '$1650209062194EQOlf:matrix.flyar.net', 'state_group': 1277497}]
The part after "During handling" is repeated. It seems one event is blocking everything? How can I get rid of this?
Error message is from #12191. @richvdh, one for you?
Possibly, but there's not really enough info in those logs to help. @esackbauer: is that the complete log? there is nothing logged at INFO?
[edited by @richvdh to fix formatting]
There is actually nothing more. the room which is not working is !DJCGiBKsgowqkkhkfS:matrix.flyar.net
2022-04-20 14:14:25,352 - synapse.access.http.8008 - 427 - INFO - GET-777- 10.0.0.254 - 8008 - {@xxxxxxx:matrix.flyar.net} Processed request: 7.106sec/0.001sec (0.002sec, 0.000sec) (0.001sec/0.003sec/2) 427B 200 "GET /_matrix/client/r0/sync?filter=16&timeout=30000&since=s1777543_19487621_0_763076_55581_74_3576_68313_1 HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.10.10 Chrome/98.0.4758.74 Electron/17.0.0 Safari/537.36" [0 dbevts]
2022-04-20 14:14:25,353 - synapse.access.http.8008 - 427 - INFO - PUT-782- 10.0.0.254 - 8008 - {derechosdigitales.org} Processed request: 0.028sec/0.000sec (0.004sec, 0.000sec) (0.002sec/0.018sec/8) 11B 200 "PUT /_matrix/federation/v1/send/1649264576283 HTTP/1.0" "Synapse/1.54.0" [0 dbevts]
2022-04-20 14:14:25,857 - synapse.http.server - 100 - ERROR - GET-786- Failed handle request via 'RoomMessageListRestServlet': <XForwardedForRequest at 0x7f5ca035c880 method='GET' uri='/_matrix/client/r0/rooms/!DJCGiBKsgowqkkhkfS:matrix.flyar.net/messages?limit=20&dir=b&from=t26905-1777420_19484474_34_762945_55494_74_3575_68099_1&filter=%7B%22lazy_load_members%22%3Atrue%7D' clientproto='HTTP/1.0' site='8008'>
Traceback (most recent call last):
File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
result = current_context.run(gen.send, result)
StopIteration: [{'event_id': '$1650209062194EQOlf:matrix.flyar.net', 'state_group': 1277497}]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
result = current_context.run(gen.send, result)
File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/databases/main/state.py", line 331, in _get_state_group_for_events
raise RuntimeError("No state group for unknown or outlier event %s" % e)
RuntimeError: No state group for unknown or outlier event $1638282339138PTbRx:matrix.flyar.net
2022-04-20 14:14:25,857 - synapse.access.http.8008 - 427 - INFO - GET-786- 10.0.0.254 - 8008 - {@xxxxxxx:matrix.flyar.net} Processed request: 0.003sec/0.000sec (0.000sec, 0.000sec) (0.000sec/0.001sec/1) 55B 500 "GET /_matrix/client/r0/rooms/!DJCGiBKsgowqkkhkfS:matrix.flyar.net/messages?limit=20&dir=b&from=t26905-1777420_19484474_34_762945_55494_74_3575_68099_1&filter=%7B%22lazy_load_members%22%3Atrue%7D HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.10.10 Chrome/98.0.4758.74 Electron/17.0.0 Safari/537.36" [0 dbevts]
2022-04-20 14:14:28,904 - synapse.federation.transport.server.federation - 102 - INFO - PUT-787- Received txn 1648140153658 from theyosh.nl. (PDUs: 0, EDUs: 1)
2022-04-20 14:14:28,924 - synapse.access.http.8008 - 427 - INFO - GET-784- 10.0.0.254 - 8008 - {@xxxxxxx:matrix.flyar.net} Processed request: 3.538sec/0.001sec (0.001sec, 0.000sec) (0.001sec/0.003sec/2) 386B 200 "GET /_matrix/client/r0/sync?filter=16&timeout=30000&since=s1777543_19487622_0_763076_55581_74_3576_68313_1 HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.10.10 Chrome/98.0.4758.74 Electron/17.0.0 Safari/537.36" [0 dbevts]
2022-04-20 14:14:28,927 - synapse.access.http.8008 - 427 - INFO - PUT-787- 10.0.0.254 - 8008 - {theyosh.nl} Processed request: 0.023sec/0.000sec (0.001sec, 0.002sec) (0.002sec/0.016sec/6) 11B 200 "PUT /_matrix/federation/v1/send/1648140153658 HTTP/1.0" "Synapse/1.49.2" [0 dbevts]
2022-04-20 14:14:29,072 - synapse.metrics._gc - 120 - INFO - sentinel- Collecting gc 1
2022-04-20 14:14:29,747 - synapse.federation.transport.server.federation - 102 - INFO - PUT-790- Received txn 1647625979627 from jaccu.se. (PDUs: 0, EDUs: 1)
2022-04-20 14:14:29,753 - synapse.access.http.8008 - 427 - INFO - PUT-790- 10.0.0.254 - 8008 - {jaccu.se} Processed request: 0.010sec/0.000sec (0.001sec, 0.000sec) (0.001sec/0.005sec/3) 11B 200 "PUT /_matrix/federation/v1/send/1647625979627 HTTP/1.0" "Synapse/1.54.0" [0 dbevts]
2022-04-20 14:14:29,856 - synapse.http.server - 100 - ERROR - GET-791- Failed handle request via 'RoomMessageListRestServlet': <XForwardedForRequest at 0x7f5ca025ff40 method='GET' uri='/_matrix/client/r0/rooms/!DJCGiBKsgowqkkhkfS:matrix.flyar.net/messages?limit=20&dir=b&from=t26905-1777420_19484474_34_762945_55494_74_3575_68099_1&filter=%7B%22lazy_load_members%22%3Atrue%7D' clientproto='HTTP/1.0' site='8008'>
Traceback (most recent call last):
File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
result = current_context.run(gen.send, result)
StopIteration: [{'event_id': '$1650209062194EQOlf:matrix.flyar.net', 'state_group': 1277497}]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
result = current_context.run(gen.send, result)
File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/databases/main/state.py", line 331, in _get_state_group_for_events
raise RuntimeError("No state group for unknown or outlier event %s" % e)
RuntimeError: No state group for unknown or outlier event $1638282339138PTbRx:matrix.flyar.net
2022-04-20 14:14:29,857 - synapse.access.http.8008 - 427 - INFO - GET-791- 10.0.0.254 - 8008 - {@xxxxxxx:matrix.flyar.net} Processed request: 0.003sec/0.000sec (0.002sec, 0.000sec) (0.000sec/0.001sec/1) 55B 500 "GET /_matrix/client/r0/rooms/!DJCGiBKsgowqkkhkfS:matrix.flyar.net/messages?limit=20&dir=b&from=t26905-1777420_19484474_34_762945_55494_74_3575_68099_1&filter=%7B%22lazy_load_members%22%3Atrue%7D HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.10.10 Chrome/98.0.4758.74 Electron/17.0.0 Safari/537.36" [0 dbevts]
2022-04-20 14:14:30,002 - synapse.federation.transport.server.federation - 102 - INFO - PUT-792- Received txn 1648259608250 from chat.tux.land. (PDUs: 0, EDUs: 1)
2022-04-20 14:14:30,009 - synapse.access.http.8008 - 427 - INFO - PUT-792- 10.0.0.254 - 8008 - {chat.tux.land} Processed request: 0.014sec/0.000sec (0.001sec, 0.000sec) (0.001sec/0.007sec/3) 11B 200 "PUT /_matrix/federation/v1/send/1648259608250 HTTP/1.0" "Synapse/1.55.0" [0 dbevts]
[please wrap your logs in triple-backticks (```) for legibility]
well, that's frustrating. Could you maybe enable DEBUG logging (change the level
to DEBUG in your log config file)?
Ok, sorry. The debug output is rather large, I did a clean start of synapse and logged everything after I tried to post something in that room.
Because its so large, I attached the homeserver.log
Right, well, that sheds a bit of light on the situation, but not much.
I'd like to ask you to run some queries on your database - could you DM me at @richvdh:sw1v.org
?
DM started
I think you've got a problem with your configuration:
Invalid signature for server sw1v.org with key ed25519:a_XfsG: Unable to verify signature for sw1v.org: <class 'nacl.exceptions.BadSignatureError'> Signature was forged or corrupt
Typically this is caused by a misconfigured reverse-proxy - see https://matrix-org.github.io/synapse/latest/reverse_proxy.html (and https://github.com/matrix-org/synapse/issues/3294)
I am using Sophos Firewall as reverse proxy. It provides x-forwarded headers, homeserver.yaml also contains x_forwarded: true, dunno why this isnt working suddenly. To rule this out I have now the federation running on different port with DNAT rule. You should have a new invitation now.
It seems that a particular event ($1638282339138PTbRx:matrix.flyar.net
) is marked as a forward extremity, but that event is not present in the database.
Attempts to delete the room are failing, because the delete operation tries to generate leave events for each of the members of the room - which requires the forward extremities to exist.
The event id suggests that the event was created in November 2021, so this problem has probably been waiting undetected since then, and it only became a real problem when Synapse 1.57 added the extra validation.
I'm at a bit of a loss to explain how this could have happened. There's essentially no way for an entry to be added to event_forward_extremites
without there also being an entry in events
. It could have been related to https://github.com/matrix-org/synapse/issues/11521, but since this room isn't federated, I don't think it can be.
Hi
I just want to report that I have the same issue and if you need some more sample data I'm happy to help with providing information.
@Kidswiss thanks! What is the event id of the missing forward extremity? Do you know when it got added, and do you have logs going back that far?
@richvdh how can I determine the date it was added? I found it in the event_forward_extremities
table. But it doesn't exist in the events table. Is that supposed to be the case or is there already something broken?
I found it in the
event_forward_extremities
table. But it doesn't exist in the events table. Is that supposed to be the case or is there already something broken?
well, that's good in that it confirms it is the same problem as @esackbauer. But no, it should not be the case.
how can I determine the date it was added?
Sadly that is hard to do. If it is a federated room, we might be able to check when other servers saw the event. If it is a V1 room, the event id will give us a clue. Otherwise... :(
Other questions which might help eliminate potential causes:
- Is this a federated room (ie, are there now, or have there ever been, any users on other servers in this room)?
- do you have retention enabled on your server?
- Have you used the delete room API or the purge history API on this room, either directly or via some control panel or "maintenance" script?
- Have you done any database maintenance operations which could somehow have introduced this (eg: backup/restore; port from sqlite)?
For reference, here is a query which will confirm if your system has been affected by this bug:
select * from event_forward_extremities efe left join events e using (event_id) where e.event_id is null;
... if that returns any rows, you have this issue. (If it doesn't, you likely have a different problem and should open a separate issue)
@richvdh
- nope, it's not federated, it's a local only room and always has been the room is v5
- no retention enabled
- also haven't used any purging or scripts of this sort. However, due to a storage issue and faulty backups, I lost all attachments sometime last year. But for the whole instance, not just that room. Maybe that's relevant.
- I've dumped and restored the postgresql database a few weeks back due to a migration. As far as I can remember, that went smoothly.
I've since downgraded to 1.56 again, as that's a fairly important room for me.
EDIT: running your sql query, I actually have two rooms that suffer from this.
I lost all attachments sometime last year.
attachments as in uploaded media? shouldn't be a factor.
I've dumped and restored the postgresql database a few weeks back due to a migration
It's conceivable this could have introduced a problem by not restoring all rows to the events
database. Seems unlikely though.
To work around the problem, it should be safe to remove the bad rows from event_forward_extremities
with:
DELETE FROM event_forward_extremities WHERE event_id='$....';
... and then restart synapse.
(Note: don't forget the WHERE
clause! Only do this for rows returned by the select query above)
It doesn't get us any closer to figuring out the cause, though.
I took a dump, dropped the broken entries and updated to 1.57.1 and it looks good so far. Thanks!
But I agree it's a bit weird that this happened in the first place. But I won't rule out any screw-ups from my side.
Encountering a similar issue as well right after updating to 1.57.0, with these kinds of logs:
synapse | 2022-04-21 17:02:28,383 - synapse.http.server - 100 - ERROR - GET-1273- Failed handle request via 'SyncRestServlet': <XForwardedForRequest at 0x7fbb5103e100 method='GET' uri='/_matrix/client/r0/sync?fil
ter=5&timeout=0&_cacheBuster=1650553335653' clientproto='HTTP/1.1' site='8008'>
synapse | Traceback (most recent call last):
synapse | File "/usr/local/lib/python3.9/site-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks
synapse | result = current_context.run(
synapse | File "/usr/local/lib/python3.9/site-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator
synapse | return g.throw(self.type, self.value, self.tb)
synapse | File "/usr/local/lib/python3.9/site-packages/synapse/util/caches/response_cache.py", line 246, in cb
synapse | return await callback(*args, **kwargs)
synapse | File "/usr/local/lib/python3.9/site-packages/synapse/handlers/sync.py", line 355, in _wait_for_sync_for_user
synapse | result: SyncResult = await self.current_sync_for_user(
synapse | File "/usr/local/lib/python3.9/site-packages/synapse/handlers/sync.py", line 406, in current_sync_for_user
synapse | sync_result = await self.generate_sync_result(
synapse | File "/usr/local/lib/python3.9/site-packages/synapse/handlers/sync.py", line 1114, in generate_sync_result
synapse | res = await self._generate_sync_entry_for_rooms(
synapse | File "/usr/local/lib/python3.9/site-packages/synapse/handlers/sync.py", line 1625, in _generate_sync_entry_for_rooms
synapse | await concurrently_execute(handle_room_entries, room_entries, 10)
synapse | File "/usr/local/lib/python3.9/site-packages/synapse/util/async_helpers.py", line 233, in concurrently_execute
synapse | await yieldable_gather_results(
synapse | File "/usr/local/lib/python3.9/site-packages/synapse/util/async_helpers.py", line 274, in yieldable_gather_results
synapse | raise dfe.subFailure.value from None
synapse | File "/usr/local/lib/python3.9/site-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks
synapse | result = current_context.run(
synapse | File "/usr/local/lib/python3.9/site-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator
synapse | return g.throw(self.type, self.value, self.tb)
synapse | File "/usr/local/lib/python3.9/site-packages/synapse/util/async_helpers.py", line 225, in _concurrently_execute_inner
synapse | await maybe_awaitable(func(value))
synapse | File "/usr/local/lib/python3.9/site-packages/synapse/handlers/sync.py", line 1614, in handle_room_entries
synapse | await self._generate_room_entry(
synapse | File "/usr/local/lib/python3.9/site-packages/synapse/handlers/sync.py", line 2060, in _generate_room_entry
synapse | batch = await self._load_filtered_recents(
synapse | File "/usr/local/lib/python3.9/site-packages/synapse/handlers/sync.py", line 592, in _load_filtered_recents
synapse | loaded_recents = await filter_events_for_client(
synapse | File "/usr/local/lib/python3.9/site-packages/synapse/visibility.py", line 85, in filter_events_for_client
synapse | event_id_to_state = await storage.state.get_state_for_events(
synapse | File "/usr/local/lib/python3.9/site-packages/synapse/storage/state.py", line 671, in get_state_for_events
synapse | event_to_groups = await self.stores.main._get_state_group_for_events(event_ids)
synapse | File "/usr/local/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
synapse | result = current_context.run(gen.send, result)
synapse | File "/usr/local/lib/python3.9/site-packages/synapse/storage/databases/main/state.py", line 331, in _get_state_group_for_events
synapse | raise RuntimeError("No state group for unknown or outlier event %s" % e)
synapse | RuntimeError: No state group for unknown or outlier event XXXXXXXX
(I replaced the actual event id by XXXXXXXX
)
This happens when trying an initial sync when logging in with a fresh element-web instance.
However, running the query from https://github.com/matrix-org/synapse/issues/12507#issuecomment-1104903000 did not return anything, so I suppose it's different?
UPDATE: the problem disappeared after rolling back to 1.54.0, I can log in and sync once again. This problem was thus created by a change appearing between v1.55.0 and v1.57.0.
I seemingly have the same issue. I had 2 entries in the select * from event_forward_extremities efe left join events e using (event_id) where e.event_id is null;
query both of which have been deleted, the problem persists after deleting the rows.
My instance of matrix/synapse has been floating around for 4 or 5 years probably and updated regularly along the way (avhost container). If any other info is helpful just let me know what I can provide.
EDIT: after some time the issue cleared itself up (after deleting the db entries). Not sure if some other gc process eventually happened that fully cleared it up or what.
Just wanted to note that a user just reported this for a matrix.org room on my homeserver...
got this error too. with both 157 and 1571 v1.56.0 works
@travisghansen:
EDIT: after some time the issue cleared itself up (after deleting the db entries).
Likely Synapse had cached the deleted rows. I've updated the instructions to suggest restarting it.
@Ezwen:
However, running the query from https://github.com/matrix-org/synapse/issues/12507#issuecomment-1104903000 did not return anything, so I suppose it's different?
Please don't report different issues here.
People encountering this issue as confirmed by the query at https://github.com/matrix-org/synapse/issues/12507#issuecomment-1104903000:
First: note that this error is reporting corruption in your database which may have happened a long time ago. The only recent change is that Synapse now checks for the corruption.
Second, please confirm:
- Is this affecting a federated room (ie, are there now, or have there ever been, any users on other servers in this room)?
- do you have retention enabled on your server?
- Have you used the delete room API or the purge history API on this room, either directly or via some control panel or "maintenance" script?
- Have you done any database maintenance operations which could somehow have introduced this (eg: backup/restore; port from sqlite)?
the query returns no rows when ran on the db. i have federation on in general, i couldnt find the event id in question in the db. no idea what room the error came from. retention has been on for the last weeks, i have now turned it off, because i suspect it to be the origin. however the error remains after restarting synapse. db maintenance is pg_dump, compress_states_large_rooms, vaccum. its the initial sync that is broken for me. for at least two users. this is the apache (proxy) log line:
"GET /_matrix/client/r0/sync?filter=5&timeout=0&_cacheBuster=1650918641901 HTTP/2.0" 500 719 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:99.0) Gecko/20100101 Firefox/99.0"
synapse error:
2022-04-25 22:33:25,682 - synapse.access.http.8008 - 427 - INFO - GET-2- 10.0.0.223 - 8008 - {@asd:masd.asd} Processed request: 0.002sec/0.001sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 55B 500 "GET /_matrix/client/r0/sync?filter=5&timeout=0&_cacheBuster=1650918805656 HTTP/1.1" "Mozilla/5.0 (X11; Linux x86_64; rv:99.0) Gecko/20100101 Firefox/99.0" [0 dbevts]
2022-04-25 22:33:32,459 - synapse.util.caches.response_cache - 254 - INFO - GET-3- [sync]: using completed cached result for [(@asd:masd.asd, 0, None, '5', False, 'noideawhatthisis')]
2022-04-25 22:33:32,459 - synapse.http.server - 100 - ERROR - GET-3- Failed handle request via 'SyncRestServlet': <XForwardedForRequest at 0x7f5f145ad8b0 method='GET' uri='/_matrix/client/r0/sync?filter=5&timeout=0&_cacheBuster=1650918812445' clientproto='HTTP/1.1' site='8008'>
Traceback (most recent call last):
File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/http/server.py", line 268, in _async_render_wrapper
callback_return = await self._async_render(request)
File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/http/server.py", line 470, in _async_render
callback_return = await raw_callback_return
File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/rest/client/sync.py", line 192, in on_GET
sync_result = await self.sync_handler.wait_for_sync_for_user(
File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/handlers/sync.py", line 297, in wait_for_sync_for_user
res = await self.response_cache.wrap(
File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/util/caches/response_cache.py", line 265, in wrap
return await make_deferred_yieldable(result)
File "/home/server/.synapse/env/lib/python3.8/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/home/server/.synapse/env/lib/python3.8/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/util/caches/response_cache.py", line 246, in cb
return await callback(*args, **kwargs)
File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/handlers/sync.py", line 355, in _wait_for_sync_for_user
result: SyncResult = await self.current_sync_for_user(
File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/handlers/sync.py", line 406, in current_sync_for_user
sync_result = await self.generate_sync_result(
File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/handlers/sync.py", line 1114, in generate_sync_result
res = await self._generate_sync_entry_for_rooms(
File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/handlers/sync.py", line 1625, in _generate_sync_entry_for_rooms
await concurrently_execute(handle_room_entries, room_entries, 10)
File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/util/async_helpers.py", line 233, in concurrently_execute
await yieldable_gather_results(
File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/util/async_helpers.py", line 274, in yieldable_gather_results
raise dfe.subFailure.value from None
File "/home/server/.synapse/env/lib/python3.8/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/home/server/.synapse/env/lib/python3.8/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/util/async_helpers.py", line 225, in _concurrently_execute_inner
await maybe_awaitable(func(value))
File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/handlers/sync.py", line 1614, in handle_room_entries
await self._generate_room_entry(
File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/handlers/sync.py", line 2060, in _generate_room_entry
batch = await self._load_filtered_recents(
File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/handlers/sync.py", line 592, in _load_filtered_recents
loaded_recents = await filter_events_for_client(
File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/visibility.py", line 85, in filter_events_for_client
event_id_to_state = await storage.state.get_state_for_events(
File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/storage/state.py", line 671, in get_state_for_events
event_to_groups = await self.stores.main._get_state_group_for_events(event_ids)
File "/home/server/.synapse/env/lib/python3.8/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/home/server/.synapse/env/lib/python3.8/site-packages/synapse/storage/databases/main/state.py", line 331, in _get_state_group_for_events
raise RuntimeError("No state group for unknown or outlier event %s" % e)
RuntimeError: No state group for unknown or outlier event $4C1ulcbO5gZeV2t0sc5GASD2G09VvFm0bTGWIZDBlIw
thanks for the help.
- Is this affecting a federated room (ie, are there now, or have there ever been, any users on other servers in this room)?
My install is fully isolated currently.
- do you have retention enabled on your server?
I've never enabled that, so I'm guessing the default is no. Glad to now know it exists however! That's partly why I wasn't federating anything as I knew the data would grow massively on public rooms etc :)
- Have you used the delete room API or the purge history API on this room, either directly or via some control panel or "maintenance" script?
No.
- Have you done any database maintenance operations which could somehow have introduced this (eg: backup/restore; port from sqlite)?
No.
@MparkG: as the earlier comments make very clear: "If it doesn't [return any rows] you likely have a different problem and should open a separate issue".
@travisghansen thanks for the answers, though honestly they don't help me get any closer to understanding what can have caused this.
Be wary of retention: it's not a well-used feature and the reason I asked about it is that it is likely to cause corruption bugs like this.
We're seeing exceptions like these as well and those rooms can't be written to. This is a room on the hackint IRC bridge, so lots of federated users, no usage of delete room or purge history API endpoints and we are using rust-synapse-compress-state
regularly.
We're on Synapse 1.58.01.59.1 right now.
Feel free to contact me at @hexa:lossy.network
and we can debug this together.
Traceback
synapse.http.server: [PUT-672070] Failed handle request via 'RoomSendEventRestServlet': <XForwardedForRequest at 0x7fb87ab32a30 method='PUT' uri='/_matrix/client/r0/rooms/!lxUgQdCOpWnOrYNVxb%3Ahax404.de/send/m.room.message/1653997592776__inc2?user_id=%40hexa-%3Ahackint.org' clientproto='HTTP/1.1' site='8008'>
Traceback (most recent call last):
File "/nix/store/ks8r355nmkcx9q23s4m3nm1y767rfrln-python3.9-Twisted-22.4.0/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
result = current_context.run(gen.send, result)
StopIteration: [{'event_id': '$Ps2Ot0YLz20kbjS8lRFmK_Yo4t2fXPAvq3IU0CQTMoU', 'state_group': 820535}]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/nix/store/ks8r355nmkcx9q23s4m3nm1y767rfrln-python3.9-Twisted-22.4.0/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
result = current_context.run(gen.send, result)
File "/nix/store/lan1pgij0701dzi9nsvvy6nn62qk93b5-matrix-synapse-1.58.0/lib/python3.9/site-packages/synapse/storage/databases/main/state.py", line 332, in _get_state_group_for_events
raise RuntimeError("No state group for unknown or outlier event %s" % e)
RuntimeError: No state group for unknown or outlier event $FXKIdG6LkVKVsOEROx4IBer-eqPsQ-HCWdWpRuqyRFk
Retention
We are indeed using a pretty aggressive retention rule, since we only want to relay the bridged content, but not store it persistenly.
retention:
allowed_lifetime_max: 1m
allowed_lifetime_min: 1m
default_policy:
max_lifetime: 1m
min_lifetime: 1m
enabled: true
purge_jobs:
- interval: 30m
Query result
select * from event_forward_extremities efe left join events e using (event_id) where e.event_id is null; (7 rows)
Not the event id mentioned in the traceback above, but I guess that's still worth debugging/fixing.
No corruption log lines
We don't see any corruption log events from #12620
Removed events according to https://github.com/matrix-org/synapse/issues/12507#issuecomment-1105061360
matrix-synapse=# SELECT count(event_id) FROM event_forward_extremities;
count
-------
2660
(1 row)
matrix-synapse=# BEGIN;
BEGIN
matrix-synapse=# DELETE FROM event_forward_extremities WHERE event_id IN (select event_id from event_forward_extremities efe left join events e using (event_id) where e.event_id is null);
DELETE 7
matrix-synapse=# SELECT count(event_id) FROM event_forward_extremities;
count
-------
2655
(1 row)
matrix-synapse=# END;
COMMIT
2660 - 7 = 2655 … okay. :thinking: