ejabberd icon indicating copy to clipboard operation
ejabberd copied to clipboard

Memory consumption incredibly high

Open M-Stenzel opened this issue 3 years ago • 12 comments
trafficstars

Environment

  • ejabberd version: 21.12
  • Erlang version: Erlang/OTP 24 [erts-12.1.2]
  • OS: Linux - openSUSE
  • Installed from: rpm

Errors from error.log/crash.log

No errors

maybe related to #3831?

System used to run perfectly, one day (10 days ago) ejabberd made my system collapse, ejabberd memory usage of ca. 20 GB was more than to compensate with swap... I upgraded to version 21.12 (before 21.05). When I want to start ejabberd via systemd (openSuse system, from the repos) ejabberd/eimp starts but eats all of my RAM again, need to kill with the -9 switch. Log level is "info" but I do not get any error or warning messages. When doing by hand (without systemd) "sudo -u ejabberd /usr/sbin/ejabberdctl foreground" I receive:

2022-06-11 17:57:19.916593+03:00 [info] Loading configuration from /etc/ejabberd/ejabberd.yml

and that's it, no further information. No warning or error message

I use mariadb as database, no muc at all. mariadb database does not show any errors. There is about 3 GB of data in the database. There are only a few users.

I wonder how to proceed from here?

Martin.

M-Stenzel avatar Jun 11 '22 17:06 M-Stenzel

loglevel: 4 and see logs again?

licaon-kter avatar Jun 12 '22 07:06 licaon-kter

Hi, I can just offer some questions. Hopefully one of them will trigger a clue in your head:

ejabberd/eimp starts but eats all of my RAM again

Do you mean beam/beam.smp, the system process that runs the erlang virtual machine?

Or do you mean eimp, the erlang library for image manipulation, that ejabberd uses in mod_avatar and mod_http_upload

ejabberd memory usage of ca. 20 GB

Check the mnesia spool dir, which ejabberd uses for storing some information even when you configure SQL for most modules. Is there any file with substantial file size?

Try moving away those mnesia spool files, then restart ejabberd. Does it start correctly?

Loading configuration from /etc/ejabberd/ejabberd.yml and that's it, no further information.

What happens if you provide the default (the initial) configuration file, where mariadb and other settings are not yet set? Does ejabberd start correctly?

mariadb database does not show any errors. There is about 3 GB of data in the database. There are only a few users.

3GB of data in an ejabberd database with few users and no MUC rooms? In what table is that size consumed?

badlop avatar Jun 12 '22 15:06 badlop

Hi, I can just offer some questions. Hopefully one of them will trigger a clue in your head:

ejabberd/eimp starts but eats all of my RAM again

Do you mean beam/beam.smp, the system process that runs the erlang virtual machine?

this is the process that takes triggers the massive RAM use

/usr/lib64/erlang/erts-12.1.2/bin/beam.smp -K true -P 250000 -- -root /usr/lib64/erlang -progname erl -- -home /var/lib/ejabberd -- -sname ejabberd@localhost -mnesia dir "/var/lib/ejabberd" -s ejabberd -noshell -noinput

so it is not "eimp"

Or do you mean eimp, the erlang library for image manipulation, that ejabberd uses in mod_avatar and mod_http_upload

ejabberd memory usage of ca. 20 GB

Check the mnesia spool dir, which ejabberd uses for storing some information even when you configure SQL for most modules. Is there any file with substantial file size?

the "spool" directory is empty, the "queue" directory is empty in the whole directory "/var/lib/ejabberd" there is nothing "abnormal", this is the contents


-rw-r--r--  1 ejabberd ejabberd    5464 Apr  8  2021 archive_msg.DAT
-rw-r--r--  1 ejabberd ejabberd    5916 Apr  3  2021 archive_prefs.DAT
drwxr-xr-x  3 ejabberd ejabberd    4096 May 18  2021 .cache
-rw-r--r--  1 ejabberd ejabberd   66562 Apr  3  2021 caps_features.DAT
drwxr-xr-x  2 ejabberd ejabberd    4096 Jun  2 22:02 certs
-rw-r--r--  1 ejabberd ejabberd     159 Jun  2 22:05 DECISION_TAB.LOG
-r--------  1 ejabberd ejabberd      20 Jan 15  2019 .erlang.cookie
-rw-r--r--  1 ejabberd ejabberd    7786 Apr  5  2021 last_activity.DAT
-rw-r--r--  1 ejabberd ejabberd      95 Jun  2 22:05 LATEST.LOG
-rw-r--r--  1 ejabberd ejabberd    6244 Apr  8  2021 messenger.xy-space.de
-rw-r--r--  1 ejabberd ejabberd  939423 May 24  2021 mnesia_backup
-rw-r--r--  1 ejabberd ejabberd    5464 Jan 16  2019 motd.DAT
-rw-r--r--  1 ejabberd ejabberd    5464 Jan 16  2019 motd_users.DAT
-rw-r--r--  1 ejabberd ejabberd    5464 Dec 25  2019 mqtt_pub.DAT
-rw-r--r--  1 ejabberd ejabberd       8 Jan 16  2019 muc_registered.DCD
-rw-r--r--  1 ejabberd ejabberd    5721 Apr  3  2021 muc_room.DCD
-rw-r--r--  1 ejabberd ejabberd       8 May 17  2020 oauth_client.DCD
-rw-r--r--  1 ejabberd ejabberd    5464 Jan 16  2019 oauth_token.DAT
-rw-r--r--  1 ejabberd ejabberd    9848 Apr  5  2021 offline_msg.DAT
-rw-r--r--  1 ejabberd ejabberd    7045 Apr  3  2021 passwd.DAT
-rw-r--r--  1 ejabberd ejabberd    5997 Apr  3  2021 privacy.DAT
-rw-r--r--  1 ejabberd ejabberd    7612 Apr  3  2021 private_storage.DAT
-rw-r--r--  1 ejabberd ejabberd     133 Apr  3  2021 pubsub_index.DCD
-rw-r--r--  1 ejabberd ejabberd   11240 Jun  2 22:05 pubsub_item_3.DAT
-rw-r--r--  1 ejabberd ejabberd  951067 Apr  3  2021 pubsub_item.DAT
-rw-r--r--  1 ejabberd ejabberd   69321 Apr  3  2021 pubsub_node.DCD
-rw-r--r--  1 ejabberd ejabberd       8 Jan 16  2019 pubsub_orphan.DCD
-rw-r--r--  1 ejabberd ejabberd   10662 Apr  3  2021 pubsub_state.DCD
-rw-r--r--  1 ejabberd ejabberd    1800 Apr  3  2021 pubsub_state.DCL
-rw-r--r--  1 ejabberd ejabberd    7468 Apr  3  2021 push_session.DAT
drwxr-xr-x  2 ejabberd ejabberd    4096 Jan 16  2019 queue
-rw-r--r--  1 ejabberd ejabberd    8600 Jun  2 22:05 roster_3.DAT
-rw-r--r--  1 ejabberd ejabberd   11701 Apr  3  2021 roster.DAT
-rw-r--r--  1 ejabberd ejabberd    5464 Jan 16  2019 roster_version.DAT
-rw-r--r--  1 ejabberd ejabberd   40174 Jun  2 22:02 schema.DAT
drwxr-x---  2 ejabberd ejabberd    4096 Jun 11 16:26 spool
-rw-r--r--  1 ejabberd ejabberd     310 Jun 20  2020 sr_group.DCD
-rw-r--r--  1 ejabberd ejabberd      97 Apr  3  2021 sr_user.DCD
-rw-r--r--  1 ejabberd ejabberd 1171026 Dec 30 00:02 translations.cache
drwxr-s---  5 ejabberd ejabberd    4096 Jun  2 22:02 upload
-rw-r--r--  1 ejabberd ejabberd  100204 Apr  3  2021 vcard.DAT
-rw-r--r--  1 ejabberd ejabberd    1559 Apr  3  2021 vcard_search.DCD

size of the upload directory is ca. 600 MB

Try moving away those mnesia spool files, then restart ejabberd. Does it start correctly?

Loading configuration from /etc/ejabberd/ejabberd.yml and that's it, no further information.

What happens if you provide the default (the initial) configuration file, where mariadb and other settings are not yet set? Does ejabberd start correctly?

unfortunately no change

mariadb database does not show any errors. There is about 3 GB of data in the database. There are only a few users.

3GB of data in an ejabberd database with few users and no MUC rooms? In what table is that size consumed?

This is a wrong information on my part, was too lazy :( The size is only 52 MB

M-Stenzel avatar Jun 12 '22 15:06 M-Stenzel

loglevel: 4 and see logs again?

no it is the same

M-Stenzel avatar Jun 12 '22 15:06 M-Stenzel

You are using a recent version 21.12, with a small mnesia database, and tried starting with the default ejabberd configuration. Even if using mariadb, it's rather small.

One thing that you didn't yet try is: set default configuration file, and move away all the running files that ejabberd generated in your machine: upload, queue, .cache, spool, certs...

If ejabberd starts correctly when you move away all those /var/lib/ejabberd files, then you finally know that the problem was introduced 10 days ago somewhere in one of those files or database. Maybe a bug let it happen. Now recover some path, for example .cache, and restart. Does it still work? then recover another path like upload and restart. Repeat one by one until you find what path produces the problem.

If that still fails, then I have no more clues. I'd then suggest installing ejabberd using another method, check it starts, and slowly configure as you wish, copy your database, spool files, restarting each time. If problem appears, you will know what step introduced it, and we can investigate with at least a clue.

badlop avatar Jun 12 '22 16:06 badlop

You are using a recent version 21.12, with a small mnesia database, and tried starting with the default ejabberd configuration. Even if using mariadb, it's rather small.

One thing that you didn't yet try is: set default configuration file, and move away all the running files that ejabberd generated in your machine: upload, queue, .cache, spool, certs...

If ejabberd starts correctly when you move away all those /var/lib/ejabberd files, then you finally know that the problem was introduced 10 days ago somewhere in one of those files or database. Maybe a bug let it happen. Now recover some path, for example .cache, and restart. Does it still work? then recover another path like upload and restart. Repeat one by one until you find what path produces the problem.

If that still fails, then I have no more clues. I'd then suggest installing ejabberd using another method, check it starts, and slowly configure as you wish, copy your database, spool files, restarting each time. If problem appears, you will know what step introduced it, and we can investigate with at least a clue.

Thank you for all the good ideas!!! Well I put everything to a default state, including the .../var directory, even upgraded to version 22.05 of ejabberd, default ejabberd.yml, result stays the same. I cannot reinstall the whole operating system. This is frustrating... I cannot track down the culprit.

M-Stenzel avatar Jun 12 '22 20:06 M-Stenzel

In that case, you can download ejabberd 22.05 installer, which will install everything in /opt, or in the path you specify, and will not mess with your existing installation.

If that one works, then you at least know that machine isn't cursed, it's something that your rpm installation still has and blocks the starting process.

If that one still fails, I would look if there's something in $HOME. Maybe .erlang.cookie, or .ejabberd-modules, that could still be messing in ejabberd start.

BTW, there are also container images that you can run with docker or podman.

badlop avatar Jun 12 '22 21:06 badlop

In that case, you can download ejabberd 22.05 installer, which will install everything in /opt, or in the path you specify, and will not mess with your existing installation.

If that one works, then you at least know that machine isn't cursed, it's something that your rpm installation still has and blocks the starting process.

If that one still fails, I would look if there's something in $HOME. Maybe .erlang.cookie, or .ejabberd-modules, that could still be messing in ejabberd start.

BTW, there are also container images that you can run with docker or podman.

Thank you again for your help in this topic, this is very much appreciated,

as a first step I removed .erlang.cookie, now I do receive error messages:

2022-06-13 11:04:46.374654+03:00 [error] <0.153.0> External eimp process (pid=2373) has terminated unexpectedly, restarting in a few seconds 2022-06-13 11:04:46.375047+03:00 [error] <0.157.0> External eimp process (pid=2376) has terminated unexpectedly, restarting in a few seconds 2022-06-13 11:04:46.375528+03:00 [error] <0.163.0> External eimp process (pid=2387) has terminated unexpectedly, restarting in a few seconds 2022-06-13 11:04:46.375642+03:00 [error] <0.167.0> External eimp process (pid=2390) has terminated unexpectedly, restarting in a few seconds 2022-06-13 11:04:46.375794+03:00 [error] <0.165.0> External eimp process (pid=2389) has terminated unexpectedly, restarting in a few seconds 2022-06-13 11:04:46.375970+03:00 [error] <0.155.0> External eimp process (pid=2374) has terminated unexpectedly, restarting in a few seconds 2022-06-13 11:04:46.376119+03:00 [error] <0.161.0> External eimp process (pid=2385) has terminated unexpectedly, restarting in a few seconds 2022-06-13 11:04:46.376166+03:00 [error] <0.159.0> External eimp process (pid=2377) has terminated unexpectedly, restarting in a few seconds

eimp is part of an rpm package version erlang-eimp-1.0.22

Is there any way I can debug eimp? (I know there are some more issues with eimp to be found on github).

Besides I will try the two other options (/opt & docker)

Martin.

M-Stenzel avatar Jun 13 '22 08:06 M-Stenzel

I tried rpm from process one (not from SuSE), this did not help.

Next I set up a vanilla installation of ejabberd on a different server, checked that it is running, no problems.

Next thing was to transfer the mariadb database to the new server, no problem.

However, I cannot export the mnesia database! When I do "sudo -u ejabberd ejabberdctl start" everything is fine.

When I do the final step "sudo -u ejabberd ejabberdctl backup /tmp/mnesia_backup" again the memory consumption goes up until all the physical and virtual RAM gets exhausted.

Now I have a real problem, so that I cannot even export the mnesia database which is essential for a new setup on an new server.

Now I am completely lost, how can I recover/get hold of all of my data???

Is there any other way to export the mnesia database? Maybe there is a corruption in it? How can I check for?

Martin.

M-Stenzel avatar Jun 14 '22 09:06 M-Stenzel

If I understand correctly, you have the mnesia spool files, most tables are rather small as you use mariadb for most modules, and we suspect there may be some problematic content in one of those mnesia tables. When you try to start ejabberd, it stops at "Loading configuration", it doesn't complete server start, and it doesn't complete mnesia loading.

In another testing machine you were able to install ejabberd, setup mariadb and starts correctly, only remaining is to copy the mnesia spool files. And of course, once all works in that testing machine, get all this installed in the definitive server machine you want to use.

If that's the case, some ideas:

A) Is there any information worth keeping in the old mnesia database? If you use mariadb for auth, mod_mam, mod_muc, mod_offline, ... then probably mnesia content is either obsolete, or cache content.

B) The mnesia spool files can be copied as they are from one path or machine to any other. Then only thing to care is the erlang node name: it must be the same in the new than in the old, as it's hard-coded in the mnesia files. Chances it's just ejabberd@localhost both in the old and the new machine, so you don't need to make any change, simply copy the files. If the erlang node names were different in old and new machine, there are some solutions like modifying the mnesia files... but it's easier to just force the old node name using ejabberdctl.cfg.

C) You can manually take a look at that mnesia database without using ejabberd:

$ erl -name ejabberd@localhost -mnesia dir \"/home/badlop/ejabberd/database\" -s mnesia

Erlang/OTP 24 [erts-12.3.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [jit]
Eshell V12.3.2  (abort with ^G)
(ejabberd@localhost)1> mnesia:info().   
---> Processes holding locks <--- 
---> Processes waiting for locks <--- 
---> Participant transactions <--- 
---> Coordinator transactions <---
---> Uncertain transactions <--- 
---> Active tables <--- 
schema         : with 43       records occupying 5732     words of mem
roster_version : with 0        records occupying 5464     bytes on disc
pubsub_orphan  : with 0        records occupying 305      words of mem
session        : with 0        records occupying 305      words of mem
privacy        : with 1        records occupying 6808     bytes on disc
passwd         : with 1        records occupying 5912     bytes on disc
...

Check keys of passwd mnesia table (I have only one account registered):

mnesia:dirty_all_keys(passwd).
[{<<"admin">>,<<"localhost">>}]

If there's a graphical interface and the required erlang libraries are installed, there's a GUI to view and edit mnesia table content: observer:start().. Otherwise, there may be some way to export that mnesia database to a text file so you can edit it, and import in a new installation.

badlop avatar Jun 14 '22 15:06 badlop

This is an update to the topic,

I really tried everything, from a "virgin" installation down to using my adjusted config yml file. Tried to move away databases, tried to start by hand, cost me a few days.

Unfortunately I cannot make run ejabberd on my original server, nor a new setup on a different server, using mnesia and/or mysql.

Too bad now I do not have access to my postings (only the raw data, e. g. images).

Never had this before but will stay away from ejabberd.

Anyway, thanks again,

Martin.

M-Stenzel avatar Jul 05 '22 18:07 M-Stenzel

This is the final update to the topic.

My last comment I did not really enjoy, it left me frustrated. Because of that, in a final act of an effort, I installed ejabberd on a Ubuntu server (initially 20.04, upgraded without any issues to 22.04), version 21.12 (official repos). I transferred all mnesia data and mariadb data, and doublechecked all the yml parameters, and voila, I had a running system again, what a success! I really cannot tell about the problem of the original server, maybe sth. with kernel parameters, maybe the kernel itself? Wherever the cause of the problem is rooted (I am too much of an end user) I had the chance to make my ejabberd installation work again - and this is great!

P. S. What I came across was this:

(https://askubuntu.com/questions/1411679/ubuntu-22-04-ejabberd-apparmour-profile-broken)

so I can confirm. I disabled apparmor for ejabberd. Now I have a fully running system with 100 % XMPP compliance. Thanks for your support!

M-Stenzel avatar Jul 09 '22 13:07 M-Stenzel

Sorry for the necropost, but I fell into the same issue: ejabberd not starting, not logging, just eating as much RAM as it can and then shut down by oom_killer. I did a radical cleaning of everything, just trying to start the "vanilla" service with no customization at all, no data, nothing. Same story. The only change I made to the machine was the /etc/hosts file: I filled it with a very long blacklist of "banned" domains (like this)

0.0.0.0 example.of.banned.outgoing.domain.com

and ejabberd got crazy reading the content. My fault, of course. Restoring the original /etc/hosts file made the curse disappear :-D

@M-Stenzel, was apparmor the cause of your problem on the "original" server?

udanieli avatar May 14 '23 06:05 udanieli

@udanieli it was about the size? Or a certain entry?

licaon-kter avatar May 14 '23 06:05 licaon-kter

The size, but only for lines of these kinds:

  • 0.0.0.0 crappysite.com
  • 127.0.0.1 crappysite.com

The blacklist contains ~180k unique entries in the 0.0.0.0 domainname format. I made a bisection test: 5000 entries are enough to see very high memory consumption:

  • 3.4GB for 127.0.0.1 crappysite.com entries
  • 6.7GB of RAM for 0.0.0.0 crappysite.com entries

The more lines you put, the more RAM it will eat.

If you put real IP addresses, no memory leak at all. I made a test putting in /etc/hosts 180k random IPs and domains generated with a lib. So the problem should arise when the software defines (and stores somewhere in the memory) its own local addresses.

udanieli avatar May 14 '23 07:05 udanieli

@M-Stenzel, was apparmor the cause of your problem on the "original" server?

No, I did not use "apparmor", but, at some time I used /etc/hosts entries to block IP addresses. So this means it could have caused problems at my site. With the new knowledge I might test whether the problem got solved by using a lean /etc/hosts file...

Thank you for digging in!

M-Stenzel avatar May 14 '23 08:05 M-Stenzel

@M-Stenzel, was apparmor the cause of your problem on the "original" server?

No, I did not use "apparmor", but, at some time I used /etc/hosts entries to block IP addresses. So this means it could have caused problems at my site. With the new knowledge I might test whether the problem got solved by using a lean /etc/hosts file...

Thank you for digging in!

Now you've paged me, so, I've migrated back to the original server, with everything, including the database - and - voila - there are no more problems! What made you think that the oversized /etc/hosts file could be the problem? Genius, or luck?

M-Stenzel avatar May 14 '23 15:05 M-Stenzel

Surely I am a lucky man because I love this job, but I am not a genius: I spent a couple of days swearing at the terminal with a lot of frustration.

I compared that "cursed" machine with a "sane" machine on which I got ejabberd up and running and I thought about the /etc/hosts file. It is crucial for a server and should not be mangled with those fake localhost records. It's better to set up a local DNS service and put the blacklist there.

udanieli avatar May 15 '23 09:05 udanieli

Mhhh... there is a saying: "Never change a running system." And as you put it, this file has a great importance. Anyway, I am lucky that the root of the problem was found, and hopefully by changing the code, or by the "bug" reporting in the github others will be more lucky than the two of us (regarding our time and nerves...).

Martin.

--  Martin Stenzel An der Drehscheibe 9 50733 Köln Deutschland

@@.*** @martin:matrix.xy-space.de @@.***

This message was checked by ESET Endpoint Antivirus for Linux.

https://www.eset.com

-------- Original Message -------- Subject: Re: [processone/ejabberd] Memory consumption incredibly high (Issue #3845) Date: Monday, 15 May, 2023 11:01 CEST From: Daniele Bortoluzzi @.> Reply-To: processone/ejabberd @.>

To: processone/ejabberd @.> CC: M-Stenzel @.>, Mention @.***>

References:

    Surely I am a lucky man because I love this job, but I am not a genius: I spent a couple of days swearing at the terminal with a lot of frustration. I compared that "cursed" machine with a "sane" machine on which I got ejabberd up and running and I thought about the /etc/hosts file. It is crucial for a server and should not be mangled with those fake localhost records. It's better to set up a local DNS service and put the blacklist there. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.> [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/processone/ejabberd/issues/3845#issuecomment-1547462182", "url": "https://github.com/processone/ejabberd/issues/3845#issuecomment-1547462182", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

M-Stenzel avatar May 15 '23 09:05 M-Stenzel