Docker-DocumentServer
Docker-DocumentServer copied to clipboard
rabbitmq takes forever to start, fails, and still eats 100% CPU after started, if ulimit -n set to a high value
Do you want to request a feature or report a bug?
This is a bug report which includes a workaround (see below). Motivation for filing this issue is to share this workaround (which has cost me quite a bit of debugging time) with other affected users.
What is the current behavior?
When ulimit -n
inside the document server container is set to a high value (depending on the on docker config) it takes multiple minutes to start, hanging at Starting RabbitMQ Messaging Server rabbitmq-server
- which eventually fails - though the container continues to run. After that, a process (or thread?) erl_child_setup
consumes 100% of a single CPU, and keeps running forever. The document server container is not usable at this point (health endpoint returns 502) because rabbitmq never started successfully.
If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem.
- check that
ulimit -n
is set to a high value inside of the onlyoffice containerhost $ docker run -ti --entrypoint /bin/bash onlyoffice/documentserver container $ ulimit -n 1073741816
- Start the container
host $ docker run onlyoffice/documentserver
- run
htop
on the host (which also shows container namespaced processes): showsstart-stop-daemon [...] redis-server
stuck for multiple minutes, consuming 100% CPU; thenerl_child_setup
doing the same.
What is the expected behavior?
- Container starts normally independent of
ulimit -n
setting inside of container - If the start-up of any of the required components (rabbitmq, redis, documentserver, nginx, etc) fails, the container exits with an error.
Did this work in previous versions of DocumentServer?
Yes, but I'm unsure when it stopped working.
DocumentServer Docker tag:
- dockerhub digest 8a1edcc13f9d
- image ID 5a50e3a2d2ed
Host Operating System:
Fedora 36 w/ docker version 20.10.17, build 100c701
Workaround
Set ulimit
for NOFILE
to a lower value, either individually for the documentserver container or globally for all containers.
Individually: add (e.g.) --ulimit nofile=65536:65536
to the docker command line, or
...
ulimits:
nofile:
soft: "65536"
hard: "65536"
...
to your service configuration YAML for docker-compose.
Globally: Add --default-ulimit nofile=65536:65536
to the dockerd command line.
Hi, I understand thtat this issue is reported as workaround for a problem you've got
But shouldn't this issue also be reported to RabbittMQ developers? seems they got some problems with big ulimit limits? If I understand this correctly
Good point, but after some more digging I found this: https://github.com/docker-library/rabbitmq/issues/545 Seems to be fixed upstream.
@t-lo Thanks for finding it
We use ubuntu as our base image, so I think it will take time until ubuntu will serve the version with the fix
Might be worth considering setting ulimit -n 65536
explicitly in the container entry point (or the rabbitmq init script); this would work independent of a ubuntu upstream fix (and at the end of the day do the same thing the erlang code change in upstream rabbitmq does). This could even be taken from an env variable so docker users can override it if necessary.
Thanks for this idea
I've create issue 58989 in our private issue tracker
Not sure if we will implement that, but at least we discuss it
I took a stab on this, see https://github.com/ONLYOFFICE/Docker-DocumentServer/pull/492.
It's more challenging than I thought since ubuntu 20.04 ignores /etc/security/limits.[conf|d/]
in favour of systemd service file settings (LimitNOFILE=...
) but the documentserver uses start-stop-daemon
to run, ignoring systemd's limits in turn.
So /etc/default/rabbitmq-server
seems like the best place to set this.
@t-lo Ok thanks, I'll notifiy our developers team
Hello @t-lo, it's fixed at: https://github.com/ONLYOFFICE/Docker-DocumentServer/pull/530 and will be released in the next release.
Thank you @igwyd ! What's the ETA of the next release?
(Also, I've updated PR #492 with a comment, feel free to close.)
No release date yet.
As workaround I defined ulimits in the docker-compose file:
ulimits:
nofile: 65536
Hello @t-lo, as far as I can see the problem is solved, can we close it?
I've realised that the OnlyOffice development server has been installed on my laptop for over a year and it must have been taking up a whole CPU core in the background the whole time! No wonder my battery life has been bad and my fan has been loud... After getting rid of it my idle CPU temperature dropped from 70°C to 40°C and my fan is far quieter where before it would run constantly
If it's resolved I'll close the issue. Feel free to comment or reopen it if you got further questions.