Rocket.Chat icon indicating copy to clipboard operation
Rocket.Chat copied to clipboard

Messages greyed out and don't send

Open mmgoilf opened this issue 4 years ago • 10 comments

Description:

Ever since updating to 3.12.3, messages will grey out and not send. The other person will get a notification sound, but not see the message. After restarting the server, the message will send, and all subsequent messages also send fine. But then after some hours to days, it will happen again, and I have to restart the server again.

Steps to reproduce:

  1. Update to 3.12.3
  2. Wait for messages to eventually grey out

Expected behavior:

Messages always properly send, as before

Actual behavior:

Messages grey out and don't send until server is restarted

Server Setup Information:

  • Version of Rocket.Chat Server: 3.12.3
  • Operating System: Ubuntu
  • Deployment Method: docker
  • Number of Running Instances: 1
  • MongoDB Version: v3.6.3

Client Setup Information

  • Desktop App or Browser Version: 3.12.3
  • Operating System: Arch Linux, Fedora, Windows 10

Additional context

I remember this used to be a bug in a much older version, and it got fixed, so that might make it easy to fix again.

mmgoilf avatar Apr 04 '21 05:04 mmgoilf

Hello! What is your mongodb version and OpLog status? OpLog recommended even with 1 RC instance.

ankar84 avatar Apr 05 '21 05:04 ankar84

Hello,

We have a similar problem. Everyday we must restart rocketchat, otherwise the new messages are greyed-out and don't send. In addition, the unread messages stay "unread" after we've scrolled on them.

Note: rocketchat and mongo are on docker

Versions: docker 19.03.8, rocketchat/rocket.chat:3.13.0, mongo:4.2

The OpLog's size is 1Gb and it has the events of the last 244 hours.

In rocketchat's log the following messages keep repeating : Exception in callback of async function: MongoServerSelectionError: connect ECONNREFUSED 192.168.224.2:27017 at Timeout._onTimeout (/app/bundle/programs/server/npm/node_modules/mongodb/lib/core/sdam/topology.js:438:30) at listOnTimeout (internal/timers.js:554:17) at processTimers (internal/timers.js:497:7) { reason: TopologyDescription { type: 'Single', setName: null, maxSetVersion: null, maxElectionId: null, servers: Map { 'mongo:27017' => [ServerDescription] }, stale: false, compatible: true, compatibilityError: null, logicalSessionTimeoutMinutes: null, heartbeatFrequencyMS: 10000, localThresholdMS: 15, commonWireVersion: 8 } }

(node:8) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.

Exception while invoking method loadMissedMessages Error: Error, too many requests. Please slow down. You must wait 10 seconds before trying again. [too-many-requests] at Object.post (app/api/server/v1/misc.js:256:11) at app/api/server/api.js:394:82 at Meteor.EnvironmentVariable.EVp.withValue (packages/meteor.js:1234:12) at Object._internalRouteActionHandler [as action] (app/api/server/api.js:394:39) at Route.share.Route.Route._callEndpoint (packages/nimble_restivus/lib/route.coffee:150:32) at packages/nimble_restivus/lib/route.coffee:59:33 at packages/simple_json-routes.js:98:9

medval avatar Apr 21 '21 08:04 medval

We are facing the very same problem.

What is your mongodb version and OpLog status?

We are using the snap install on Ubuntu

Versions Rocket.Chat: 3.18.3

Version der Anwendungsumgebung 1.27.1

Node-Version v12.22.1

Datenbankmigration 232 (11. April 2022 03:30)

MongoDB 3.6.14 / wiredTiger (oplog Aktiviert)

Commit Details HEAD: (47eec58e6) Zweig: HEAD

homberger avatar Apr 11 '22 08:04 homberger

up

Edit:

Same issue running containerized rocket.chat, seen this behaviour on mongodb 3.6 to 4.2 with wiredTiger and Oplog activated. Issue occured in RC version starting from 3.8.x to 3.18.7 - not sure if it still occurs on 4.x

dimitrihof avatar Oct 11 '22 14:10 dimitrihof

We have the same problem in our desktop clients (App and Web) Running inside K8s (Using official Helm chart)

Version

4.7.4

MongoDB

4.4.15 / unknown (oplog Enabled)

mhkarimi1383 avatar Oct 30 '22 08:10 mhkarimi1383

We have the same problem using RC 5.3.0 version and MongoDB 4.4.11

felipemachado12 avatar Nov 07 '22 12:11 felipemachado12

I can confirm this problem with RC 5.3.0 and MongoDB 6.0.2 running in a docker-compose stack on Ubuntu 20.04.5 LTS.

xFuture603 avatar Nov 09 '22 17:11 xFuture603

I can confirm, but with update to 5.3.2.

5.3.0 works fine for me.

Using MongoDB: 6.0.3

ykorzikowski avatar Nov 21 '22 11:11 ykorzikowski

Guess this is the reason

{"level":50,"time":"2022-11-26T10:57:13.153Z","pid":9,"hostname":"61bee5c51382","name":"StreamBroadcast","section":"Stream","msg":"Stream broadcast from '172.18.106.164:3000' to '172.18.106.160:3001' with name local not authorized"} 
{"level":50,"time":"2022-11-26T10:57:13.154Z","pid":9,"hostname":"61bee5c51382","name":"StreamBroadcast","section":"Stream","msg":"Stream broadcast from '172.18.106.164:3000' to '172.18.106.160:3000' with name local not authorized"} 
{"level":50,"time":"2022-11-26T10:57:13.593Z","pid":9,"hostname":"61bee5c51382","name":"StreamBroadcast","section":"Stream","msg":"Stream broadcast from '172.18.106.164:3000' to '172.18.106.160:3001' with name notify-room not authorized"} 
{"level":50,"time":"2022-11-26T10:57:13.594Z","pid":9,"hostname":"61bee5c51382","name":"StreamBroadcast","section":"Stream","msg":"Stream broadcast from '172.18.106.164:3000' to '172.18.106.160:3000' with name notify-room not authorized"} 
{"level":50,"time":"2022-11-26T10:57:13.598Z","pid":9,"hostname":"61bee5c51382","name":"StreamBroadcast","section":"Stream","msg":"Stream broadcast from '172.18.106.164:3000' to '172.18.106.160:3001' with name notify-room not authorized"} 
{"level":50,"time":"2022-11-26T10:57:13.599Z","pid":9,"hostname":"61bee5c51382","name":"StreamBroadcast","section":"Stream","msg":"Stream broadcast from '172.18.106.164:3000' to '172.18.106.160:3000' with name notify-room not authorized"} 

ankar84 avatar Nov 26 '22 11:11 ankar84

I catch that issue right after mongo update, When I moved primary node to another server and reboot that primary (that was primary) server I get that not authorized error messages and grayed messages in chat. When I returned primary node to first server issue didn't fixed. Then I restarted all Rocket.Chat instances one by one and that fixed issue to me.

Looks like on primary node change or on primary node restart some data stream was lost (like change stream connection) and after Rocket.Chat instances restart that stream was restored (reconnected).

@sampaiodiego to consider.

ankar84 avatar Nov 26 '22 12:11 ankar84

We're seeing a similar issue (sent messages stay gray, incoming messages don't show until refresh). We're on MongoDB 4.2.5 and RC 5.4.0.

Seems to happen when MongoDB is restarted while RC is running. But I could be wrong.

JoshMcCullough avatar Mar 23 '23 12:03 JoshMcCullough

I can confirm @JoshMcCullough 's suspicion. Our RocketChat runs in AWS Fargate (as Docker containers), pointing to a self-managed MongoDB cluster (4.4.18 / wiredTiger, oplog Enabled) with three nodes. Whenever we trigger a rs.stepDown() on the MongoDB primary of the replica set to do maintenance on the primary database node, RocketChat loses its mind and immediately gets into this "grey hanging messages" state. None of our other applications connected to this MongoDB cluster even blinks. They seamlessly talk to the newly elected primary node and move on.

I have replicated this behavior on various combinations of MongoDB:

  • 4.2
  • 4.4

And various versions of Rocketchat:

  • 4.x up to 4.8.7 (our PROD instance)
  • many 5.x versions (DEV instance)
  • 6.0.0 and latest 6.0.1 (DEV instance)

As far as I can tell, it never recovers. Only a full restart of the Rocketchat container(s) fixes this. I don't see anything unusual on the RocketChat log when this happens. We can't be the only people running a HA MongoDB cluster with a replicaSet for our RocketChat to connect to. I have experimented with the various OPLOG_THIS_AND_THAT environment variable suggestions that I found on the web, but to no avail. I'm happy to provide any info to help solve this.

smaragd avatar Mar 27 '23 17:03 smaragd