Docker-DocumentServer icon indicating copy to clipboard operation
Docker-DocumentServer copied to clipboard

rabbitmq takes forever to start, fails, and still eats 100% CPU after started, if ulimit -n set to a high value

Open t-lo opened this issue 2 years ago • 7 comments

Do you want to request a feature or report a bug?

This is a bug report which includes a workaround (see below). Motivation for filing this issue is to share this workaround (which has cost me quite a bit of debugging time) with other affected users.

What is the current behavior?

When ulimit -n inside the document server container is set to a high value (depending on the on docker config) it takes multiple minutes to start, hanging at Starting RabbitMQ Messaging Server rabbitmq-server - which eventually fails - though the container continues to run. After that, a process (or thread?) erl_child_setup consumes 100% of a single CPU, and keeps running forever. The document server container is not usable at this point (health endpoint returns 502) because rabbitmq never started successfully.

If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem.

  1. check that ulimit -n is set to a high value inside of the onlyoffice container
    host $ docker run -ti --entrypoint /bin/bash onlyoffice/documentserver
    container $ ulimit -n
    1073741816
    
  2. Start the container
    host $ docker run onlyoffice/documentserver
    
  3. run htop on the host (which also shows container namespaced processes): shows start-stop-daemon [...] redis-server stuck for multiple minutes, consuming 100% CPU; then erl_child_setup doing the same.

What is the expected behavior?

  1. Container starts normally independent of ulimit -n setting inside of container
  2. If the start-up of any of the required components (rabbitmq, redis, documentserver, nginx, etc) fails, the container exits with an error.

Did this work in previous versions of DocumentServer?

Yes, but I'm unsure when it stopped working.

DocumentServer Docker tag:

  • dockerhub digest 8a1edcc13f9d
  • image ID 5a50e3a2d2ed

Host Operating System:

Fedora 36 w/ docker version 20.10.17, build 100c701

Workaround

Set ulimit for NOFILE to a lower value, either individually for the documentserver container or globally for all containers.

Individually: add (e.g.) --ulimit nofile=65536:65536 to the docker command line, or

   ...
   ulimits:
     nofile:
       soft: "65536"
       hard: "65536"
   ...

to your service configuration YAML for docker-compose.

Globally: Add --default-ulimit nofile=65536:65536 to the dockerd command line.

t-lo avatar Sep 09 '22 12:09 t-lo

Hi, I understand thtat this issue is reported as workaround for a problem you've got

But shouldn't this issue also be reported to RabbittMQ developers? seems they got some problems with big ulimit limits? If I understand this correctly

ShockwaveNN avatar Sep 09 '22 15:09 ShockwaveNN

Good point, but after some more digging I found this: https://github.com/docker-library/rabbitmq/issues/545 Seems to be fixed upstream.

t-lo avatar Sep 11 '22 13:09 t-lo

@t-lo Thanks for finding it

We use ubuntu as our base image, so I think it will take time until ubuntu will serve the version with the fix

ShockwaveNN avatar Sep 11 '22 13:09 ShockwaveNN

Might be worth considering setting ulimit -n 65536 explicitly in the container entry point (or the rabbitmq init script); this would work independent of a ubuntu upstream fix (and at the end of the day do the same thing the erlang code change in upstream rabbitmq does). This could even be taken from an env variable so docker users can override it if necessary.

t-lo avatar Sep 11 '22 13:09 t-lo

Thanks for this idea

I've create issue 58989 in our private issue tracker

Not sure if we will implement that, but at least we discuss it

ShockwaveNN avatar Sep 11 '22 14:09 ShockwaveNN

I took a stab on this, see https://github.com/ONLYOFFICE/Docker-DocumentServer/pull/492. It's more challenging than I thought since ubuntu 20.04 ignores /etc/security/limits.[conf|d/] in favour of systemd service file settings (LimitNOFILE=...) but the documentserver uses start-stop-daemon to run, ignoring systemd's limits in turn. So /etc/default/rabbitmq-server seems like the best place to set this.

t-lo avatar Sep 12 '22 08:09 t-lo

@t-lo Ok thanks, I'll notifiy our developers team

ShockwaveNN avatar Sep 12 '22 11:09 ShockwaveNN

Hello @t-lo, it's fixed at: https://github.com/ONLYOFFICE/Docker-DocumentServer/pull/530 and will be released in the next release.

igwyd avatar Dec 02 '22 07:12 igwyd

Thank you @igwyd ! What's the ETA of the next release?

(Also, I've updated PR #492 with a comment, feel free to close.)

t-lo avatar Dec 02 '22 08:12 t-lo

No release date yet.

igwyd avatar Dec 02 '22 08:12 igwyd

As workaround I defined ulimits in the docker-compose file:

    ulimits:
      nofile: 65536

mkobel avatar Jun 28 '23 10:06 mkobel

Hello @t-lo, as far as I can see the problem is solved, can we close it?

igwyd avatar Jun 28 '23 13:06 igwyd

I've realised that the OnlyOffice development server has been installed on my laptop for over a year and it must have been taking up a whole CPU core in the background the whole time! No wonder my battery life has been bad and my fan has been loud... After getting rid of it my idle CPU temperature dropped from 70°C to 40°C and my fan is far quieter where before it would run constantly

Heath123 avatar Aug 10 '23 14:08 Heath123

If it's resolved I'll close the issue. Feel free to comment or reopen it if you got further questions.

Rita-Bubnova avatar Aug 11 '23 07:08 Rita-Bubnova