Messages greyed out and don't send
Description:
Ever since updating to 3.12.3, messages will grey out and not send. The other person will get a notification sound, but not see the message. After restarting the server, the message will send, and all subsequent messages also send fine. But then after some hours to days, it will happen again, and I have to restart the server again.
Steps to reproduce:
- Update to 3.12.3
- Wait for messages to eventually grey out
Expected behavior:
Messages always properly send, as before
Actual behavior:
Messages grey out and don't send until server is restarted
Server Setup Information:
- Version of Rocket.Chat Server: 3.12.3
- Operating System: Ubuntu
- Deployment Method: docker
- Number of Running Instances: 1
- MongoDB Version: v3.6.3
Client Setup Information
- Desktop App or Browser Version: 3.12.3
- Operating System: Arch Linux, Fedora, Windows 10
Additional context
I remember this used to be a bug in a much older version, and it got fixed, so that might make it easy to fix again.
Hello! What is your mongodb version and OpLog status? OpLog recommended even with 1 RC instance.
Hello,
We have a similar problem. Everyday we must restart rocketchat, otherwise the new messages are greyed-out and don't send. In addition, the unread messages stay "unread" after we've scrolled on them.
Note: rocketchat and mongo are on docker
Versions: docker 19.03.8, rocketchat/rocket.chat:3.13.0, mongo:4.2
The OpLog's size is 1Gb and it has the events of the last 244 hours.
In rocketchat's log the following messages keep repeating :
Exception in callback of async function: MongoServerSelectionError: connect ECONNREFUSED 192.168.224.2:27017 at Timeout._onTimeout (/app/bundle/programs/server/npm/node_modules/mongodb/lib/core/sdam/topology.js:438:30) at listOnTimeout (internal/timers.js:554:17) at processTimers (internal/timers.js:497:7) { reason: TopologyDescription { type: 'Single', setName: null, maxSetVersion: null, maxElectionId: null, servers: Map { 'mongo:27017' => [ServerDescription] }, stale: false, compatible: true, compatibilityError: null, logicalSessionTimeoutMinutes: null, heartbeatFrequencyMS: 10000, localThresholdMS: 15, commonWireVersion: 8 } }
(node:8) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
Exception while invoking method loadMissedMessages Error: Error, too many requests. Please slow down. You must wait 10 seconds before trying again. [too-many-requests] at Object.post (app/api/server/v1/misc.js:256:11) at app/api/server/api.js:394:82 at Meteor.EnvironmentVariable.EVp.withValue (packages/meteor.js:1234:12) at Object._internalRouteActionHandler [as action] (app/api/server/api.js:394:39) at Route.share.Route.Route._callEndpoint (packages/nimble_restivus/lib/route.coffee:150:32) at packages/nimble_restivus/lib/route.coffee:59:33 at packages/simple_json-routes.js:98:9
We are facing the very same problem.
What is your mongodb version and OpLog status?
We are using the snap install on Ubuntu
Versions Rocket.Chat: 3.18.3
Version der Anwendungsumgebung 1.27.1
Node-Version v12.22.1
Datenbankmigration 232 (11. April 2022 03:30)
MongoDB 3.6.14 / wiredTiger (oplog Aktiviert)
Commit Details HEAD: (47eec58e6) Zweig: HEAD
up
Edit:
Same issue running containerized rocket.chat, seen this behaviour on mongodb 3.6 to 4.2 with wiredTiger and Oplog activated. Issue occured in RC version starting from 3.8.x to 3.18.7 - not sure if it still occurs on 4.x
We have the same problem in our desktop clients (App and Web) Running inside K8s (Using official Helm chart)
Version
4.7.4
MongoDB
4.4.15 / unknown (oplog Enabled)
We have the same problem using RC 5.3.0 version and MongoDB 4.4.11
I can confirm this problem with RC 5.3.0 and MongoDB 6.0.2 running in a docker-compose stack on Ubuntu 20.04.5 LTS.
I can confirm, but with update to 5.3.2.
5.3.0 works fine for me.
Using MongoDB: 6.0.3
Guess this is the reason
{"level":50,"time":"2022-11-26T10:57:13.153Z","pid":9,"hostname":"61bee5c51382","name":"StreamBroadcast","section":"Stream","msg":"Stream broadcast from '172.18.106.164:3000' to '172.18.106.160:3001' with name local not authorized"}
{"level":50,"time":"2022-11-26T10:57:13.154Z","pid":9,"hostname":"61bee5c51382","name":"StreamBroadcast","section":"Stream","msg":"Stream broadcast from '172.18.106.164:3000' to '172.18.106.160:3000' with name local not authorized"}
{"level":50,"time":"2022-11-26T10:57:13.593Z","pid":9,"hostname":"61bee5c51382","name":"StreamBroadcast","section":"Stream","msg":"Stream broadcast from '172.18.106.164:3000' to '172.18.106.160:3001' with name notify-room not authorized"}
{"level":50,"time":"2022-11-26T10:57:13.594Z","pid":9,"hostname":"61bee5c51382","name":"StreamBroadcast","section":"Stream","msg":"Stream broadcast from '172.18.106.164:3000' to '172.18.106.160:3000' with name notify-room not authorized"}
{"level":50,"time":"2022-11-26T10:57:13.598Z","pid":9,"hostname":"61bee5c51382","name":"StreamBroadcast","section":"Stream","msg":"Stream broadcast from '172.18.106.164:3000' to '172.18.106.160:3001' with name notify-room not authorized"}
{"level":50,"time":"2022-11-26T10:57:13.599Z","pid":9,"hostname":"61bee5c51382","name":"StreamBroadcast","section":"Stream","msg":"Stream broadcast from '172.18.106.164:3000' to '172.18.106.160:3000' with name notify-room not authorized"}
I catch that issue right after mongo update, When I moved primary node to another server and reboot that primary (that was primary) server I get that not authorized error messages and grayed messages in chat. When I returned primary node to first server issue didn't fixed. Then I restarted all Rocket.Chat instances one by one and that fixed issue to me.
Looks like on primary node change or on primary node restart some data stream was lost (like change stream connection) and after Rocket.Chat instances restart that stream was restored (reconnected).
@sampaiodiego to consider.
We're seeing a similar issue (sent messages stay gray, incoming messages don't show until refresh). We're on MongoDB 4.2.5 and RC 5.4.0.
Seems to happen when MongoDB is restarted while RC is running. But I could be wrong.
I can confirm @JoshMcCullough 's suspicion. Our RocketChat runs in AWS Fargate (as Docker containers), pointing to a self-managed MongoDB cluster (4.4.18 / wiredTiger, oplog Enabled) with three nodes. Whenever we trigger a rs.stepDown() on the MongoDB primary of the replica set to do maintenance on the primary database node, RocketChat loses its mind and immediately gets into this "grey hanging messages" state. None of our other applications connected to this MongoDB cluster even blinks. They seamlessly talk to the newly elected primary node and move on.
I have replicated this behavior on various combinations of MongoDB:
- 4.2
- 4.4
And various versions of Rocketchat:
- 4.x up to 4.8.7 (our PROD instance)
- many 5.x versions (DEV instance)
- 6.0.0 and latest 6.0.1 (DEV instance)
As far as I can tell, it never recovers. Only a full restart of the Rocketchat container(s) fixes this. I don't see anything unusual on the RocketChat log when this happens. We can't be the only people running a HA MongoDB cluster with a replicaSet for our RocketChat to connect to. I have experimented with the various OPLOG_THIS_AND_THAT environment variable suggestions that I found on the web, but to no avail. I'm happy to provide any info to help solve this.