richdocumentscode icon indicating copy to clipboard operation
richdocumentscode copied to clipboard

Memory leak in coolwsd

Open gerazo opened this issue 2 years ago • 16 comments

I am using

  • Nextcloud 23.0.3
  • richdocuments 5.0.3
  • richdocumentscode 21.11.204

This is a server with around 50-100 active users.

Problem:

After around 5 days of continuous running, coolwsd process has a 30GB of resident memory taken and it is not releasing it. By this time, Collabora Office is either totally unresponsive (no documents are opened) or a document is opened but after 10 seconds, it says "connection to server lost" and kicks you out back to the folders. According to the logs, the 75% of RAM taken alert was already released a day before, so the memory consumption of coolwsd steadily rises over time. At the end, the OS OOM killer is triggered which kills the whole apache process tree as the originator of the coolwsd process. This last action also kills all other NC services, but Collabora is unusable well a day before the actual kill takes place.

The above happens regularly with 204. It did not happen with previous releases. I have just upgraded to 306. We will see how it performs. I guess there should be a memory leak somewhere in coolwsd. It is important to note that coolwsd process shows increased memory consumption, not other apache-related processes, nor the CollaboraOnline... process.

gerazo avatar Apr 19 '22 07:04 gerazo

seem to have an similar problem Debian 11 server Nextcloud 23.0.3 richdocumentscode 21.11.306 nginx as webserver with php-fpm

It now uses around 37% of the memory, but i can see it rises over the last day, had this curve over the last two weeks.

unclesam87 avatar Apr 20 '22 09:04 unclesam87

After a week, I can also say that richdocumentscode 21.11.306 is affected the very same way unfortunately. It seems that the service starts to not respond way before OOM-killer finds it, so this issue definitely causes outage in the service. My only workaround for this is to automatically restart the service every night.

gerazo avatar Apr 25 '22 06:04 gerazo

Version 21.11.402 is affected the same way.

gerazo avatar May 17 '22 13:05 gerazo

Same thing here with NextCloud 24.0.1, Nextcloud Office 6.1.1, Collabora Online - Built-in CODE Server 22.5.401. The memory usage of coolwsd increases for no apperant reason. I monitored the memory usage of the server and here ist the result:

memory-report

So after only 5 days 37 GB are used in total. The total memory usage of the server is back to 2 GB after a restart of apache.

SethosII avatar Aug 08 '22 06:08 SethosII

I monitored the memory usage further and added a daily restart of apache (the drops in memory usage are the restarts):

memory-report

The memory usage also increases with the restart although slower. So it seems to be a real memory leak.

SethosII avatar Aug 29 '22 06:08 SethosII

Reporting similar behaviour with:

richdocuments 6.2.1 richdocumentscode 22.5.502

Running Nextcloud 24.0.6 with 4GB of RAM.

Ubuntu 20.04.5 LTS nginx/1.18.0 (Ubuntu) PHP 8.0.14 fpm psql (PostgreSQL) 12.12

coolwsd memory usage slowly increases over several days until it becomes unresponsive. Grafana screenshot, for example. The spike ending is when PHP service was restarted.

Screenshot 2022-10-13 at 05-50-34

nooblag avatar Oct 12 '22 18:10 nooblag

with one month stops working in 6 GB virtual machine, definitely a memory leak is there, I am using 22.05.8.2, it was not that bad in previous versions. Put some restarts into crontab.

kadarpik avatar Dec 19 '22 13:12 kadarpik

Nidor-Dashboards-Dashboards-Grafana

I don't now what happened here, but it clearly looks like this is not a gradual leak, but something causing it to steadily allocate until… at some point, oomkiller jumped to the rescue…

kwisatz avatar Mar 12 '23 13:03 kwisatz

Same problem here, but just with the collabora server from the nextcloud app store. Servers with external collabora server do not face this issue.

NetBLOKS avatar Mar 14 '23 10:03 NetBLOKS

Servers with external collabora server do not face this issue.

Which version you @NetBLOKS run of collarbora (docker?) and what nc version, was well as setup procedure , using nginx php-fpm ? I always had these issues no matter what, but it was a few months prior

rizajur avatar Mar 15 '23 10:03 rizajur

Servers with external collabora server do not face this issue.

Which version you @NetBLOKS run of collarbora (docker?) and what nc version, was well as setup procedure , using nginx php-fpm ? I always had these issues no matter what, but it was a few months prior

Which version you @NetBLOKS run of collarbora -> App Store Version (Collabora Online - Built-in CODE Server) and what nc version (Happens in 24, and 25. got latest 25.0.4), was well as setup procedure (Manual Install, Debian 11, Apache, PHP7.4-FPM)

NetBLOKS avatar Mar 15 '23 11:03 NetBLOKS

Can confirm this is still an issue on the following stack:

Ubuntu 24.04 LTS Nginx 1.24 PHP 8.3-FPM Nextcloud 29.0.0 richdocuments: 8.4.2 richdocumentscode_arm64: 24.4.201

trenshaw avatar May 20 '24 04:05 trenshaw

Nextcloud version:29.0.4.1 Red Hat Enterprise Linux release 8.10 (Ootpa) 10.6.18-MariaDB, Apache/2.4.37 PHP 8.3.10

  • richdocuments: 8.4.4
  • richdocumentscode: 24.4.502

Since Upgrading to Nextcloud 29.0.4.1 and upgrade PHP 8.2 to PHP 8.3 - Nextcloud Server is almost crashing, because php-fpm is consuming all the space in /tmp ==>

32G /tmp/systemd-private-4c513a85a5cb462b92e805310c385d9e-php-fpm.service-r9PDnv/tmp/coolwsd.LNJU02GnN5/jails/18443-d61991e6 39G /tmp/systemd-private-4c513a85a5cb462b92e805310c385d9e-php-fpm.service-r9PDnv/tmp/coolwsd.LNJU02GnN5

After restarting PHP-FPM Service the files were removed from /tmp directory.

Here you see, that "coolwsd" is eating all the space from /tmp dir of the server in a short time:

2024.08.04 03:15:02 - Space ok 18% /dev/mapper/server-root --Mount-- / 2024.08.04 03:30:01 - Space ok 19% /dev/mapper/server-root --Mount-- / 2024.08.04 03:45:01 - Space ok 20% /dev/mapper/server-root --Mount-- / 2024.08.04 04:15:03 - Space ok 22% /dev/mapper/server-root --Mount-- / 2024.08.04 04:30:03 - Space ok 23% /dev/mapper/server-root --Mount-- / 2024.08.04 05:00:01 - Space ok 24% /dev/mapper/server-root --Mount-- / 2024.08.04 05:15:02 - Space ok 25% /dev/mapper/server-root --Mount-- / 2024.08.04 05:30:01 - Space ok 26% /dev/mapper/server-root --Mount-- / 2024.08.04 05:45:02 - Space ok 27% /dev/mapper/server-root --Mount-- / 2024.08.04 06:15:01 - Space ok 28% /dev/mapper/server-root --Mount-- / 2024.08.04 06:30:04 - Space ok 32% /dev/mapper/server-root --Mount-- / 2024.08.04 07:00:48 - Space ok 33% /dev/mapper/server-root --Mount-- / 2024.08.04 07:30:01 - Space ok 34% /dev/mapper/server-root --Mount-- / 2024.08.04 07:45:01 - Space ok 35% /dev/mapper/server-root --Mount-- / 2024.08.04 08:00:01 - Space ok 36% /dev/mapper/server-root --Mount-- / 2024.08.04 08:15:02 - Space ok 37% /dev/mapper/server-root --Mount-- / 2024.08.04 08:30:01 - Space ok 38% /dev/mapper/server-root --Mount-- / 2024.08.04 08:45:01 - Space ok 39% /dev/mapper/server-root --Mount-- / 2024.08.04 09:00:02 - Space ok 40% /dev/mapper/server-root --Mount-- / 2024.08.04 09:15:01 - Space ok 41% /dev/mapper/server-root --Mount-- / 2024.08.04 09:30:01 - Space ok 42% /dev/mapper/server-root --Mount-- / 2024.08.04 09:45:02 - Space ok 43% /dev/mapper/server-root --Mount-- / 2024.08.04 10:00:01 - Space ok 44% /dev/mapper/server-root --Mount-- / 2024.08.04 11:00:03 - Space ok 46% /dev/mapper/server-root --Mount-- / 2024.08.04 11:15:01 - Space ok 47% /dev/mapper/server-root --Mount-- / 2024.08.04 11:30:01 - Space ok 49% /dev/mapper/server-root --Mount-- / 2024.08.04 11:45:01 - Space ok 51% /dev/mapper/server-root --Mount-- / 2024.08.04 12:00:02 - Space ok 54% /dev/mapper/server-root --Mount-- / 2024.08.04 12:15:01 - Space ok 55% /dev/mapper/server-root --Mount-- /

Also the memory on the serveris decreasing and decraesing ==>

image

See also those errors in php-fpm.log ==>

PHP Fatal error: Uncaught TypeError: implode(): Argument #1 ($array) must be of type array, string given in /var/www/html/nextcloud/apps/richdocumentscode/proxy.php:398 Stack trace: #0 /var/www/html/nextcloud/apps/richdocumentscode/proxy.php(398): implode() #1 {main} thrown in /var/www/html/nextcloud/apps/richdocumentscode/proxy.php on line 398 [04-Aug-2024 12:08:26 richdocumentscode (proxy.php) error exit, PID: 509268, Message: No content in reply from coolwsd. Is SSL enabled in error ? [04-Aug-2024 12:08:26] PHP Warning: http_response_code(): Cannot set response code - headers already sent (output started at /var/www/html/nextcloud/apps/richdocumentscode/proxy.php:30) in /var/www/html/nextcloud/apps/richdocumentscode/proxy.php on line 34 [04-Aug-2024 12:19:49 PHP Warning: http_response_code(): Cannot set response code - headers already sent (output started at /var/www/html/nextcloud/apps/richdocumentscode/proxy.php:285) in /var/www/html/nextcloud/apps/richdocumentscode/proxy.php on line 292

Workaround:

  • [ 1 ] - did restart apache, redis and php-fpm - now the memory- & space consumption is normal again

Githopp192 avatar Aug 04 '24 11:08 Githopp192

restart apache does solve the issue temporarely .. after a short while .. collwsd will write about 5-10GB per hour .. 100-200GB per day !

This is a big issue, which affects the server stability

Githopp192 avatar Aug 07 '24 12:08 Githopp192

@Githopp192 This issue is about a memory link, not disk space / /tmp.

joshtrichards avatar Aug 21 '24 13:08 joshtrichards

By reading my comments, check the graph, too - i've affected by a memory leak, too

"Also the memory on the server is decreasing and decreasing ==>"

See the image obove ...

image

Githopp192 avatar Aug 21 '24 23:08 Githopp192