Rocket.Chat icon indicating copy to clipboard operation
Rocket.Chat copied to clipboard

Large "Download My Data" export crashes RC

Open cprice-kgi opened this issue 5 years ago • 5 comments

Description:

When "Download My Data" is initiated for a user who has an excessive amount of historical data, the RC service will crash while trying to compile an export archive.

Steps to reproduce:

  1. Click on "Download My Data" for a user with a large amount of historical data in the system.
  2. Observe that eventually the system will grind to a halt.

Expected behavior:

The archive should be produced and a link should be emailed to the user.

Actual behavior:

The export job tries to gather all historical data into an archive, but the volume of data eventually brings down the system. My assumption is that this is due to the usage of /tmp or tmpfs as the workspace for compiling "Download My Data" archives. I have not tested it yet, but I imagine a channel export of a channel with excessive amounts of data (assets, etc.) would also produce a problem.

Server Setup Information:

  • Version of Rocket.Chat Server: 3.6.0
  • Operating System: RedHat
  • Deployment Method: tar file
  • Number of Running Instances: 3
  • DB Replicaset Oplog:
  • NodeJS Version:
  • MongoDB Version:

Client Setup Information

  • Desktop App or Browser Version: Chrome
  • Operating System:

Additional context

Our sever has 2.0G of space allocated for tmpfs. Ideally, the temporary area used by the export functions to compile archives of exported data would take place in a much larger area. I'm not an expert in tmpfs, so please correct me if I am wrong, but I believe the max of what you can allocate to it is limited by your server's RAM, and that's really not enough to support one user with a lot of data, much less any concurrent exports that may take place.

Relevant logs:

cprice-kgi avatar Dec 02 '20 14:12 cprice-kgi

possibly a duplicate of #20839

gabrieleiro avatar Apr 27 '21 12:04 gabrieleiro

@cprice-kgi what are the values for "Processing frenquency" and "Message limit per request" in your server?

gabrieleiro avatar Apr 28 '21 15:04 gabrieleiro

also how long does it usually take for the server to go unresponsive?

gabrieleiro avatar Apr 28 '21 18:04 gabrieleiro

Yesterday, my server seems to got struck by this issue.

It is not a large instance:

  • There are 21 users in total, some of them are inactive.
  • There are 16 channels with around 210k messages in total.
  • The MongoDB has a size of about 710 MiB.
  • There are 4717 uploaded files with a total size of 4.5 GiB.
  • Version of Rocket.Chat Server: 5.4.2
  • Deployment Method: Docker
  • MongoDB Version: 5.0.14

At ~11:45, one of the users requested a data export.

Around 18:30, the I/O load of the Docker host went up.

I didn't find any useful logs anywhere. Using iotop and lsof I determined the reason for the high I/O load to be four Node.js processes each continuously writing huge amounts of data to disk. At 19:30, /tmp/userData inside the Docker container had a size of 4 GiB and /tmp/zipFiles of around 1.2 GiB.

As a first measure, I restarted the Docker container. For a short period of time, the I/O load went down, but soon the four Node.js processes came back, again writing huge amounts of data to disk.

Eventually, I disabled the "Data Exporter Processor" as suggested in this comment: https://github.com/RocketChat/Rocket.Chat/issues/24923#issuecomment-1077504749

However, as soon as I re-enable the "Data Exporter Processor", the four I/O hungry Node.js processes come back.

I think that every Rocket.Chat instance with a sufficiently huge size of uploads is at risk of such a meltdown.

paulchen avatar Feb 01 '23 19:02 paulchen

Server 6.5 Method debian packages. User via LDAP

the Server gets OOM killed before even sending a Link.

b90g avatar Feb 15 '24 14:02 b90g