EmailNotification BackgroundJob consumes all server resources and does not stop / runs forever
Steps to reproduce
- Have 2000 items in activity_mq queue
- leave email settings (hostname & port) blank
- enjoy
Expected behaviour
The Job should recognize that E-Mails cannot be sent, or at least not try to send the mails more than once during execution, and stop after a reasonable time limit (e.g. 5 minutes).
Actual behaviour
The job will try (and fail) to send every email, but will continue forever to do so, because the job will not stop until all E-Mails are sent. Meanwhile, other cronjobs will be executed and a second instance of this same job will start (and fail) to do its work. And then a third. And a fourth...
After 5 days, 10 instances are running, each consuming 90 - 100% of a core. Within a matter of days, all cores of the server are running at 100% serving this job's endless loop, and other services on the server start to experience difficulties.
Server configuration
Operating system: Ubuntu 18.10
Web server: Apache
Database: MariaDB
PHP version: whatever docker image with tag 19-apache uses
Nextcloud version: (see Nextcloud admin page) 19.0.0 (issue around since at least NC 16 or 17)
Where did you install Nextcloud from: docker image with tag 19-apache
Signing status:
No errors have been found.
I guess the rest doesn't really matter.
How to fix this
- make sure the job will not try to send the mails more than once
- make sure the job will stop after a time limit has been reached
- don't try to send emails if no email settings are configured (we had to to this because it was not acceptable, nor avoidable in any other fashion, that automatic welcome mails had been sent in previous Nextcloud versions. There are other valid use cases where emails are not wanted)
It was very difficult to figure out which job is actually causing the problem. We worked around the multiple instances by prefixing run-one to the cronjob, resulting ultimately in only the EmailNotification Job running, not stopping, and blocking the start of any other cronjob.
Ultimately I was able to identify the job using strace (this contained the database queries and finally gave the first hint that lead to the activity app and the email notification queue). I had previously ruled out this job, because there are plenty calls to ImageMagick during the execution... and I got stuck with that information.
This Issue still persists. Has anyone looked into it?