dendrite icon indicating copy to clipboard operation
dendrite copied to clipboard

dendrite server suddenly goes completely silent

Open grisu48 opened this issue 2 years ago • 14 comments

Background information

  • Dendrite version or git SHA: 0.8.7
  • Monolith or Polylith?: Monolith
  • SQLite3 or Postgres?: Postgres
  • Running in Docker?: podman
  • go version: (container build-in)
  • Client used (if applicable): Element Web and Android

Description

Since a server reboot this morning I cannot send messages anymore via Element:

image

The server was running fine for the last weeks without any noticeable issues.

In the dendrite log I find error messages like the following:

time="2022-06-09T07:34:51.751232393Z" level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" req.id=wV81oVBTTPge req.method=PUT req.path="/_matrix/client/r0/rooms/!d8rmAZMAod6n7EsZ:m.feldspaten.org/send/m.room.encrypted/$local.c2f3d6f8-eb95-42b3-9ff3-38d8e179e2a9" user_id="@phoenix:m.feldspaten.org"
time="2022-06-09T07:35:15.652729793Z" level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" req.id=3Z5ZPJdx0DkE req.method=PUT req.path="/_matrix/client/r0/rooms/!d8rmAZMAod6n7EsZ:m.feldspaten.org/send/m.room.encrypted/m1654759992909.1" user_id="@phoenix:m.feldspaten.org"
time="2022-06-09T07:36:20.461579016Z" level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" req.id=ZRaKTs4mqVqg req.method=PUT req.path="/_matrix/client/r0/rooms/!d8rmAZMAod6n7EsZ:m.feldspaten.org/send/m.room.encrypted/m1654759992909.1" user_id="@phoenix:m.feldspaten.org"
time="2022-06-09T07:36:41.052252332Z" level=info msg="Executing UpdateUserDailyVisits"
time="2022-06-09T07:37:28.839353423Z" level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" req.id=AbPuX72a3rSu req.method=PUT req.path="/_matrix/client/r0/rooms/!d8rmAZMAod6n7EsZ:m.feldspaten.org/send/m.room.encrypted/m1654759992909.1" user_id="@phoenix:m.feldspaten.org"
time="2022-06-09T07:38:45.475754429Z" level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" req.id=u5zAstAfeuKd req.method=PUT req.path="/_matrix/client/r0/rooms/!d8rmAZMAod6n7EsZ:m.feldspaten.org/send/m.room.encrypted/m1654759992909.1" user_id="@phoenix:m.feldspaten.org"

Steps to reproduce

  • No concrete reproducer identified

I'm also attaching the server log after the reboot here: log.txt

grisu48 avatar Jun 09 '22 07:06 grisu48

Anyone an idea? My homeserver is practically toast since an automated reboot last Thursday. I can't send any messages even to internal channels and federated channels remain quiet since back then. I still see the above stated error messages with level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" in the logs.

Logging out and logging in didn't helped, nor did a reboot, nor did an update to the latest container version nor did a re-creation of the whole container (using the old data files)

The instance was working fine for at least a month and stopped working without me touching anything. The reboot happened as a part of the automated update procedure and does not alter the state of the container or it's configuration files.

To me, this issue appeared completely out of the blue and I am not aware of any changes that I did what could have triggered this.

grisu48 avatar Jun 16 '22 07:06 grisu48

context canceled normally means that the client gave up waiting for a response from the server so it closed the connection, and instead of continuing to do work for a client that's given up, we stop processing too. Normally it signals that whatever work we were doing wasn't finished.

Is this happening in just specific rooms or is it happening in all of them?

neilalexander avatar Jun 29 '22 10:06 neilalexander

This happens to the whole server, in fact the whole server is completely silent since I posted this bug. Affected rooms are

  • Local rooms i.e. rooms only existing on the server
  • Remote rooms, I don't see any incoming messages or can send outgoing messages

The latter also applies to "chatty" rooms where I know that traffic is ongoing. Everything went completely silent.

grisu48 avatar Jun 30 '22 07:06 grisu48

Following the input of @neilalexander I updated the title of this issue.

grisu48 avatar Jun 30 '22 07:06 grisu48

The same issue after upgrading to v0.8.9 some days.

pcmid avatar Jul 10 '22 14:07 pcmid

Could this be the same issue as #2566 ?

edocod1 avatar Jul 12 '22 07:07 edocod1

I have the same issue. Has anybody had any luck with resolving this. My homeserver is useless in this state.

TheBinaryLoop avatar Sep 25 '22 11:09 TheBinaryLoop

Have you evaluated whether your PostgreSQL connection counts are correct? Your Dendrite config must not try to use more connections than PostgreSQL is configured to allow or you'll run into problems like this.

@edocod1 @TheBinaryLoop To be clear are you seeing context canceled specifically, or a different error?

neilalexander avatar Sep 26 '22 15:09 neilalexander

Yes. I see context canceled


From: Neil Alexander @.> Sent: Monday, September 26, 2022 5:57:04 PM To: matrix-org/dendrite @.> Cc: Lukas Eßmann @.>; Mention @.> Subject: Re: [matrix-org/dendrite] dendrite server suddenly goes completely silent (Issue #2524)

Have you evaluated whether your PostgreSQL connection counts are correct? Your Dendrite config must not try to use more connections than PostgreSQL is configured to allow or you'll run into problems like this.

@edocod1https://github.com/edocod1 @TheBinaryLoophttps://github.com/TheBinaryLoop To be clear are you seeing context canceled specifically, or a different error?

— Reply to this email directly, view it on GitHubhttps://github.com/matrix-org/dendrite/issues/2524#issuecomment-1258265227, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADGS77FMYOB6Z5SDVOGUJWTWAHBVBANCNFSM5YJCIH6Q. You are receiving this because you were mentioned.Message ID: @.***>

TheBinaryLoop avatar Sep 26 '22 16:09 TheBinaryLoop

I had this issue and fixed it by deleting the jetstream directory.

mat-1 avatar Nov 13 '22 05:11 mat-1

Deleting jetstream did not resolve the issue for me, it still appears randomly

Since there might be multiple issues here, in my case it's:

  • incoming federation fails (and (sometimes?) local message sync too)
  • lots of "context canceled" errors
  • fixed after restart

vpzomtrrfrt avatar Dec 07 '23 00:12 vpzomtrrfrt

my current workaround is a wrapper program that terminates dendrite if it prints /level=error.*context canceled/

vpzomtrrfrt avatar Dec 30 '23 17:12 vpzomtrrfrt

I just had "this" happen to me after an ISP external IP number change. Nothing I did made any difference, but searching for the "context canceled" brought me here. Since restarting (I use the monolith docker image) made no difference I stopped and removed the image, deleted the jetstream volume and then pulled a fresh image. That finally got the server running again.

Dendrite version 0.13.6+87f028d

(was at 0.13.5 when the IP number initially changed, upgrade as one of the things I tried to get things running again)

Typical log error, and I know they don't say much. That was all there was besides the normal stuff.

time="2024-02-13T17:59:07.748363621Z" level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" req.id=************ req.method=PUT req.path="/_matrix/client/r0/rooms/!hooh0PDGQ1jLlf5u:************/send/m.room.encrypted/kMXEventLocalId_************************" user_id="@*************"

Unfortunately it seems deleting jetstream might have caused secondary issues now that'll I try to fix manually now.

time="2024-02-13T19:47:34.506466310Z" level=warning msg="failed to get state after $************* locally" context=missing error="storage: event IDs missing from the database (0 != 1)" room_id="!****************" txn_event="$****************" txn_prev_events="[*********************]"

troed avatar Feb 13 '24 19:02 troed

I keep having this happen to me. My server is completely busted until a rm -rf ./jetstream and restart dendrite. Then things come back online but I get a bunch of broken event ids error logs for a few hours.

niebloomj avatar Feb 20 '24 00:02 niebloomj