Rocket.Chat Messages greyed out and don't send

Description:

Ever since updating to 3.12.3, messages will grey out and not send. The other person will get a notification sound, but not see the message. After restarting the server, the message will send, and all subsequent messages also send fine. But then after some hours to days, it will happen again, and I have to restart the server again.

Steps to reproduce:

Update to 3.12.3
Wait for messages to eventually grey out

Expected behavior:

Messages always properly send, as before

Actual behavior:

Messages grey out and don't send until server is restarted

Server Setup Information:

Version of Rocket.Chat Server: 3.12.3
Operating System: Ubuntu
Deployment Method: docker
Number of Running Instances: 1
MongoDB Version: v3.6.3

Client Setup Information

Desktop App or Browser Version: 3.12.3
Operating System: Arch Linux, Fedora, Windows 10

Additional context

I remember this used to be a bug in a much older version, and it got fixed, so that might make it easy to fix again.

Apr 04 '21 05:04 mmgoilf

Hello! What is your mongodb version and OpLog status? OpLog recommended even with 1 RC instance.

Apr 05 '21 05:04 ankar84

Hello,

We have a similar problem. Everyday we must restart rocketchat, otherwise the new messages are greyed-out and don't send. In addition, the unread messages stay "unread" after we've scrolled on them.

Note: rocketchat and mongo are on docker

Versions: docker 19.03.8, rocketchat/rocket.chat:3.13.0, mongo:4.2

The OpLog's size is 1Gb and it has the events of the last 244 hours.

In rocketchat's log the following messages keep repeating : Exception in callback of async function: MongoServerSelectionError: connect ECONNREFUSED 192.168.224.2:27017 at Timeout._onTimeout (/app/bundle/programs/server/npm/node_modules/mongodb/lib/core/sdam/topology.js:438:30) at listOnTimeout (internal/timers.js:554:17) at processTimers (internal/timers.js:497:7) { reason: TopologyDescription { type: 'Single', setName: null, maxSetVersion: null, maxElectionId: null, servers: Map { 'mongo:27017' => [ServerDescription] }, stale: false, compatible: true, compatibilityError: null, logicalSessionTimeoutMinutes: null, heartbeatFrequencyMS: 10000, localThresholdMS: 15, commonWireVersion: 8 } }

(node:8) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.

Exception while invoking method loadMissedMessages Error: Error, too many requests. Please slow down. You must wait 10 seconds before trying again. [too-many-requests] at Object.post (app/api/server/v1/misc.js:256:11) at app/api/server/api.js:394:82 at Meteor.EnvironmentVariable.EVp.withValue (packages/meteor.js:1234:12) at Object._internalRouteActionHandler [as action] (app/api/server/api.js:394:39) at Route.share.Route.Route._callEndpoint (packages/nimble_restivus/lib/route.coffee:150:32) at packages/nimble_restivus/lib/route.coffee:59:33 at packages/simple_json-routes.js:98:9

Apr 21 '21 08:04 medval

We are facing the very same problem.

What is your mongodb version and OpLog status?

We are using the snap install on Ubuntu

Versions Rocket.Chat: 3.18.3

Version der Anwendungsumgebung 1.27.1

Node-Version v12.22.1

Datenbankmigration 232 (11. April 2022 03:30)

MongoDB 3.6.14 / wiredTiger (oplog Aktiviert)

Commit Details HEAD: (47eec58e6) Zweig: HEAD

Apr 11 '22 08:04 homberger

up

Edit:

Same issue running containerized rocket.chat, seen this behaviour on mongodb 3.6 to 4.2 with wiredTiger and Oplog activated. Issue occured in RC version starting from 3.8.x to 3.18.7 - not sure if it still occurs on 4.x

Oct 11 '22 14:10 dimitrihof

We have the same problem in our desktop clients (App and Web) Running inside K8s (Using official Helm chart)

Version

4.7.4

MongoDB

4.4.15 / unknown (oplog Enabled)

Oct 30 '22 08:10 mhkarimi1383

We have the same problem using RC 5.3.0 version and MongoDB 4.4.11

Nov 07 '22 12:11 felipemachado12

I can confirm this problem with RC 5.3.0 and MongoDB 6.0.2 running in a docker-compose stack on Ubuntu 20.04.5 LTS.

Nov 09 '22 17:11 xFuture603

I can confirm, but with update to 5.3.2.

5.3.0 works fine for me.

Using MongoDB: 6.0.3

Nov 21 '22 11:11 ykorzikowski

Guess this is the reason

{"level":50,"time":"2022-11-26T10:57:13.153Z","pid":9,"hostname":"61bee5c51382","name":"StreamBroadcast","section":"Stream","msg":"Stream broadcast from '172.18.106.164:3000' to '172.18.106.160:3001' with name local not authorized"} 
{"level":50,"time":"2022-11-26T10:57:13.154Z","pid":9,"hostname":"61bee5c51382","name":"StreamBroadcast","section":"Stream","msg":"Stream broadcast from '172.18.106.164:3000' to '172.18.106.160:3000' with name local not authorized"} 
{"level":50,"time":"2022-11-26T10:57:13.593Z","pid":9,"hostname":"61bee5c51382","name":"StreamBroadcast","section":"Stream","msg":"Stream broadcast from '172.18.106.164:3000' to '172.18.106.160:3001' with name notify-room not authorized"} 
{"level":50,"time":"2022-11-26T10:57:13.594Z","pid":9,"hostname":"61bee5c51382","name":"StreamBroadcast","section":"Stream","msg":"Stream broadcast from '172.18.106.164:3000' to '172.18.106.160:3000' with name notify-room not authorized"} 
{"level":50,"time":"2022-11-26T10:57:13.598Z","pid":9,"hostname":"61bee5c51382","name":"StreamBroadcast","section":"Stream","msg":"Stream broadcast from '172.18.106.164:3000' to '172.18.106.160:3001' with name notify-room not authorized"} 
{"level":50,"time":"2022-11-26T10:57:13.599Z","pid":9,"hostname":"61bee5c51382","name":"StreamBroadcast","section":"Stream","msg":"Stream broadcast from '172.18.106.164:3000' to '172.18.106.160:3000' with name notify-room not authorized"}

Nov 26 '22 11:11 ankar84

I catch that issue right after mongo update, When I moved primary node to another server and reboot that primary (that was primary) server I get that not authorized error messages and grayed messages in chat. When I returned primary node to first server issue didn't fixed. Then I restarted all Rocket.Chat instances one by one and that fixed issue to me.

Looks like on primary node change or on primary node restart some data stream was lost (like change stream connection) and after Rocket.Chat instances restart that stream was restored (reconnected).

@sampaiodiego to consider.

Nov 26 '22 12:11 ankar84

We're seeing a similar issue (sent messages stay gray, incoming messages don't show until refresh). We're on MongoDB 4.2.5 and RC 5.4.0.

Seems to happen when MongoDB is restarted while RC is running. But I could be wrong.

Mar 23 '23 12:03 JoshMcCullough

I can confirm @JoshMcCullough 's suspicion. Our RocketChat runs in AWS Fargate (as Docker containers), pointing to a self-managed MongoDB cluster (4.4.18 / wiredTiger, oplog Enabled) with three nodes. Whenever we trigger a rs.stepDown() on the MongoDB primary of the replica set to do maintenance on the primary database node, RocketChat loses its mind and immediately gets into this "grey hanging messages" state. None of our other applications connected to this MongoDB cluster even blinks. They seamlessly talk to the newly elected primary node and move on.

I have replicated this behavior on various combinations of MongoDB:

4.2
4.4

And various versions of Rocketchat:

4.x up to 4.8.7 (our PROD instance)
many 5.x versions (DEV instance)
6.0.0 and latest 6.0.1 (DEV instance)

As far as I can tell, it never recovers. Only a full restart of the Rocketchat container(s) fixes this. I don't see anything unusual on the RocketChat log when this happens. We can't be the only people running a HA MongoDB cluster with a replicaSet for our RocketChat to connect to. I have experimented with the various OPLOG_THIS_AND_THAT environment variable suggestions that I found on the web, but to no avail. I'm happy to provide any info to help solve this.

Mar 27 '23 17:03 smaragd