Large "Download My Data" export crashes RC
Description:
When "Download My Data" is initiated for a user who has an excessive amount of historical data, the RC service will crash while trying to compile an export archive.
Steps to reproduce:
- Click on "Download My Data" for a user with a large amount of historical data in the system.
- Observe that eventually the system will grind to a halt.
Expected behavior:
The archive should be produced and a link should be emailed to the user.
Actual behavior:
The export job tries to gather all historical data into an archive, but the volume of data eventually brings down the system. My assumption is that this is due to the usage of /tmp or tmpfs as the workspace for compiling "Download My Data" archives. I have not tested it yet, but I imagine a channel export of a channel with excessive amounts of data (assets, etc.) would also produce a problem.
Server Setup Information:
- Version of Rocket.Chat Server: 3.6.0
- Operating System: RedHat
- Deployment Method: tar file
- Number of Running Instances: 3
- DB Replicaset Oplog:
- NodeJS Version:
- MongoDB Version:
Client Setup Information
- Desktop App or Browser Version: Chrome
- Operating System:
Additional context
Our sever has 2.0G of space allocated for tmpfs. Ideally, the temporary area used by the export functions to compile archives of exported data would take place in a much larger area. I'm not an expert in tmpfs, so please correct me if I am wrong, but I believe the max of what you can allocate to it is limited by your server's RAM, and that's really not enough to support one user with a lot of data, much less any concurrent exports that may take place.
Relevant logs:
possibly a duplicate of #20839
@cprice-kgi what are the values for "Processing frenquency" and "Message limit per request" in your server?
also how long does it usually take for the server to go unresponsive?
Yesterday, my server seems to got struck by this issue.
It is not a large instance:
- There are 21 users in total, some of them are inactive.
- There are 16 channels with around 210k messages in total.
- The MongoDB has a size of about 710 MiB.
- There are 4717 uploaded files with a total size of 4.5 GiB.
- Version of Rocket.Chat Server: 5.4.2
- Deployment Method: Docker
- MongoDB Version: 5.0.14
At ~11:45, one of the users requested a data export.
Around 18:30, the I/O load of the Docker host went up.
I didn't find any useful logs anywhere. Using iotop and lsof I determined the reason for the high I/O load to be four Node.js processes each continuously writing huge amounts of data to disk. At 19:30, /tmp/userData inside the Docker container had a size of 4 GiB and /tmp/zipFiles of around 1.2 GiB.
As a first measure, I restarted the Docker container. For a short period of time, the I/O load went down, but soon the four Node.js processes came back, again writing huge amounts of data to disk.
Eventually, I disabled the "Data Exporter Processor" as suggested in this comment: https://github.com/RocketChat/Rocket.Chat/issues/24923#issuecomment-1077504749
However, as soon as I re-enable the "Data Exporter Processor", the four I/O hungry Node.js processes come back.
I think that every Rocket.Chat instance with a sufficiently huge size of uploads is at risk of such a meltdown.
Server 6.5 Method debian packages. User via LDAP
the Server gets OOM killed before even sending a Link.