Upgrading to 24.12 caused 5x memory consumption
Environment
- ejabberd version: 24.12
- Erlang version: 14.2.1 (Erlang/OTP 26)
- OS: Linux (Debian)
- Installed from: docker:ecs
Bug description
After upgrading ejabberd from v22.10 to v24.12, we are seeing a major increase in memory consumption, making the system unstable without additional hardware.
System Setup: • Cluster: 3 nodes initially, expanded to 6 nodes to mitigate issue • Concurrent Users: ~3600 users total • Previous Version: 23.01 • Current Version: 24.12 • Hardware: Each node with sufficient CPU and RAM capacity • Database: AWS RDS (MySql)
Observed Behavior:
| State Number of Nodes | Total Memory Used | Memory per Node |
|---|---|---|
| v22.10 (before upgrade) | 3 | ~1.6 GB |
| v24.12 (immediately after upgrade) | 3 | ~2 GB |
| v24.12 (after 4 hours) | 3 | ~5 GB |
| v24.12 (after scaling to 6 nodes) | 6 | ~9.4 GB |
Steps Taken to Mitigate: • Scaled cluster from 3 nodes → 6 nodes • Performed rolling restarts to balance users across 6 nodes • Current state is somewhat stable, but total memory use has increased nearly 6x compared to previous version (9.4 GB vs. 1.6 GB).
Notes: • No significant change in number of concurrent users or traffic pattern. • No configuration changes except for the version upgrade.
We need Guidance, is this a known issue or configuration change required in v24.12 ? Maybe some core module have new behavior that we need to consider ?
Here are some images to show the issues:
The change happened on 12:30:
Right before the changes:
after a few hours:
After we added 3 more machines, and reset the existing ones:
Cleaned up config? SQL used? Details?
Regarding SQL I mentioned the DB used it AWS RDS, please aim me for specific details.
Previous Ejabberd version was 23.01 (not 22.10), I updated the orgiainl content
Config:
###
### ejabberd configuration file
###
### The parameters used in this configuration file are explained at
###
### https://docs.ejabberd.im/admin/configuration
###
### The configuration file is written in YAML.
### *******************************************************
### ******* !!! WARNING !!! *******
### ******* YAML IS INDENTATION SENSITIVE *******
### ******* MAKE SURE YOU INDENT SECTIONS CORRECTLY *******
### *******************************************************
### Refer to http://en.wikipedia.org/wiki/YAML for the brief description.
###
language: "en"
hosts:
- obarash.com
loglevel: info
log_rotate_size: 1048576000
log_rotate_count: 7
certfiles:
- "/home/ejabberd/conf/domain.pem"
ca_file: "/home/ejabberd/conf/cacert.pem"
sql_type: mysql
sql_server: "mysql"
sql_database: "ejabberd"
sql_username: "ejabberd"
sql_password: "passwsord"
sql_port: 3306
sql_pool_size: 20
sql_keepalive_interval: 1
sql_start_interval: 5
sql_ssl: false
sql_ssl_verify: false
sql_ssl_cafile: "/tmp/cacert.crt"
new_sql_schema: false
update_sql_schema: false
default_db: sql
default_ram_db: mnesia
auth_method: sql
allow_contrib_modules: true
cache_size: 20000
max_fsm_queue: 30000
listen:
- port: 5222
ip: "::"
module: ejabberd_c2s
protocol_options:
- "no_sslv2"
- "no_sslv3"
- "no_tlsv1"
- "no_tlsv1_1"
ciphers: "ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES256-SHA:ECDHE-ECDSA-DES-CBC3-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA:!DSS"
starttls: true
starttls_required: false
tls_compression: false
max_stanza_size: 262144
shaper: c2s_shaper
access: c2s
use_proxy_protocol: false
- port: 5269
ip: "::"
module: ejabberd_s2s_in
max_stanza_size: 524288
- port: 5443
ip: "::"
module: ejabberd_http
request_handlers:
/upload: mod_upload
/ws: ejabberd_http_ws
/bosh: mod_bosh
captcha: false
use_proxy_protocol: false
tls: false
- port: 5280
ip: "::"
module: ejabberd_http
request_handlers:
/admin: ejabberd_web_admin
/api/v0: mod_http_api
/api: mod_http_api
captcha: false
use_proxy_protocol: false
tls: false
- port: 1111
ip: "::"
module: ejabberd_http
request_handlers:
/metrics: mod_obarash_custom-1
captcha: false
use_proxy_protocol: false
tls: false
s2s_use_starttls: optional
acl:
local:
user_regexp: ""
loopback:
ip:
- 127.0.0.0/8
- ::1/128
admin:
user:
- "[email protected]"
- "[email protected]"
access_rules:
local:
allow: local
c2s:
deny: blocked
allow: all
announce:
allow: admin
configure:
allow: admin
muc_create:
allow: local
pubsub_createnode:
allow: local
register:
allow: local
trusted_network:
allow: loopback
api_permissions:
"console commands":
from:
- ejabberd_ctl
who: all
what: "*"
"admin access":
who:
# - ip: "172.19.170.0/24"
# - ip: "172.19.140.0/24"
# - ip: "172.18.64.0/23"
- ip: "127.0.0.0/8"
- access:
- allow:
- acl: loopback
- acl: admin
what:
- "*"
- "!stop"
- "!start"
"public commands":
who:
ip: 127.0.0.1/8
what:
- status
- connected_users_number
shaper:
normal:
rate: 3000
burst_size: 20000
fast: 100000
shaper_rules:
max_user_sessions: 10
max_user_offline_messages:
5000: admin
500: all
c2s_shaper:
none: admin
normal: all
s2s_shaper: fast
acme:
auto: false
contact: "mailto:[email protected]"
ca_url: "https://acme-v02.api.letsencrypt.org/directory"
modules:
mod_adhoc: {}
# mod_admin_update_sql: {}
mod_admin_extra: {}
mod_announce:
access: announce
mod_avatar: {}
mod_blocking: {}
mod_bosh: {}
mod_caps:
use_cache: true
mod_carboncopy: {}
mod_client_state: {}
mod_configure: {}
mod_disco: {}
# mod_fail2ban: {}
mod_http_api:
default_version: 0
# mod_http_upload: {}
mod_last: {}
mod_mam:
## Mnesia is limited to 2GB, better to use an SQL backend
## For small servers SQLite is a good fit and is very easy
## to configure. Uncomment this when you have SQL configured:
db_type: sql
assume_mam_usage: true
default: always
user_mucsub_from_muc_archive: true
# mod_mqtt: {}
mod_muc:
access:
- allow
access_admin:
- allow: admin
access_create:
- allow: admin
access_persistent: muc_create
access_mam:
- allow
default_room_options:
allow_user_invites: false
allow_subscription: true
allow_change_subj: false
allow_query_users: true
allowpm: anyone
mam: true
members_by_default: true
members_only: false
logging: true
persistent: true
anonymous: false
public: false
presence_broadcast:
- visitor
history_size: 0
max_users: 5000
max_user_conferences: 5000
preload_rooms: false
mod_muc_admin: {
subscribe_room_many_max_users: 4000
}
mod_offline:
access_max_user_messages: max_user_offline_messages
mod_ping:
send_pings: true
ping_interval: 60
ping_ack_timeout: 30
timeout_action: kill
mod_privacy: {}
# mod_private: {}
mod_proxy65:
access: local
max_connections: 5
mod_pubsub:
access_createnode: pubsub_createnode
## reduces resource comsumption, but XEP incompliant
ignore_pep_from_offline: false
## XEP compliant, but increases resource comsumption
## ignore_pep_from_offline: false
last_item_cache: false
max_items_node: 1000
plugins:
- flat
- pep
force_node_config:
## Change from "whitelist" to "open" to enable OMEMO support
## See https://github.com/processone/ejabberd/issues/2425
"eu.siacs.conversations.axolotl.*":
access_model: whitelist
## Avoid buggy clients to make their bookmarks public
storage:bookmarks:
access_model: whitelist
obarash:roster:x:
access_model: presence
notification_type: normal
mod_push: {}
mod_push_keepalive: {}
mod_register:
## Only accept registration requests from the "trusted"
## network (see access_rules section above).
## Think twice before enabling registration from any
## address. See the Jabber SPAM Manifesto for details:
## https://github.com/ge0rg/jabber-spam-fighting-manifesto
ip_access: trusted_network
mod_roster:
versioning: true
store_current_id: true
db_type: sql
cache_size: 80000
mod_s2s_dialback: {}
mod_shared_roster: {}
mod_stream_mgmt:
resend_on_timeout: true
resume_timeout: 30
max_ack_queue: 10000
# mod_stun_disco: {}
mod_vcard: {}
mod_vcard_xupdate: {}
mod_version:
show_os: false
mod_obarash_custom-1: {}
mod_obarash_custom-2: {}
mod_obarash_custom-3: {}
mod_obarash_custom-4: {}
mod_obarash_custom-5: {}
mod_obarash_custom-6: {}
mod_obarash_custom-7: {}
mod_obarash_custom-8: {}
mod_obarash_custom-9: {}
mod_obarash_custom-10: {}
mod_obarash_custom-11: {}
mod_obarash_custom-12: {}
mod_obarash_custom-13: {}
### Local Variables:
### mode: yaml
### End:
### vim: set filetype=yaml tabstop=8
Check activity of spam bots. Maybe spammer uses your server for out (sending) spam.
@member7me We have ruled out the spam bot option, we have checked the all connected users and approved each one of them. In addition, this behavior has started just after upgrading the server, that indicates it's not rlated to any external unwanted activity. Our passwords are GUID, changed frequently (acctually, every time user logs in).
You followed each version upgrade note, intermediary too? So you've read https://docs.ejabberd.im/admin/upgrade/#specific-version-upgrade-notes ?
23.01 -> 24.12
We need Guidance, is this a known issue or configuration change required in v24.12 ?
I don't remember any report similar to this. Make sure you followed all the upgrade notes.
No configuration changes except for the version upgrade.
Ok, so you made the minimal configuration changes required to upgrade ejabberd, and did not enable new modules or options.
Maybe some core module have new behavior that we need to consider ?
Yes, that's probably the case. Taking a quick look at the roadmap, there were many improvements and changes that take effect automatically.
No significant change in number of concurrent users or traffic pattern.
Are your users humans performing typical tasks (change presence every few minutes, send message every few seconds, chatting in a few reasonable-size chatrooms, ...) or are they programs that may send big amount of presence changes, messages per second, or may be in large chatrooms (more than 100)?
Are your users using well-known XMPP clients/libraries, or are using less-known or custom-made clients/libraries that may trigger some edge-case in ejabberd?
Check the ejabberd log files: do they show unusual behaviour? like reconnections, error messages, warnings...
It may be possible that whatever the problem is, it's already solved in recent ejabberd, but I imagine you cannot setup a temporary server with just 1 node running 25.04 to test for a few minutes if it behaves as in 23.01 or still consumes memory as in 24.12...
Let's assume some change in ejabberd drives crazy the clients, or the clients now trigger some edge-case in ejabberd. And let's assume the consumption is concentrated in just 1 feature, 1 process, or 1 process type...
There are several ways to view the erlang processes (and their consumption) that live in an erlang node:
etop
ejabberdctl etop
observer_cli
ejabberdctl module_install ejabberd_observer_cli
ejabberdctl debug
ejabberd_observer_cli:start().
then navigate over the console: press H + Enter, etc
sort erlang processes by their memory usage, or by "reductions"
Glossary:
the reduction counter is normally incremented by one for each function and BIF call
Built-In Functions (BIFs) are implemented in C code in the runtime system. BIFs do things that are difficult or impossible to implement in Erlang.
@badlop thanks for your response.
We are using ejabberd/ecs docker image and I see that both ejabberdctl etop and ejabberdctl module_install ejabberd_observer_cli are not available.
On etop command we get the response:
Error! Failed to load module 'etop' because it cannot be found. Make sure that the module name is correct and
that its .beam file is in the code path.
And for ejabberdctl module_install ejabberd_observer_cli we get: Error: not_available.
Regarding the users behavior, our user's behavior didn't change from the previous version. We have a few thousands of mobile users using the Smack client library for android, they move from place to place so sometimes they disconnect from the server due to network inavailability. we have some groups that bigger than 100 members, but the most of them are with 10-20 members. But the most important point is, that this behavior was the same before we upgraded the server.
Error! Failed to load module
etop
The ecs container image does not include the observer library nor its etop module, which comes from Erlang/OTP.
Solution: you could switch to the ejabberd container image, as that one includes observer, and consequently also etop (if interested, you can check the image differences).
But wait, etop provides very little information anyway, so try the next idea, which works correctly with the ecs container image too:
And for ejabberdctl module_install ejabberd_observer_cli we get:
Error: not_available.
The ecs container image does not include the ejabberd-contrib git repository, and does not include git or mix required to download the dependencies.
This is a step by step solution:
- Tell ejabberd to download the
ejabberd-contribgit repository:
$ podman exec ejabberd-ejabberd ejabberdctl modules_update_specs
-
ejabberd_observer_clidepends on other libraries that ejabberd will attempt to download usinggitormix... Let's installgitin the container image:
$ podman exec --user root ejabberd-ejabberd apk add git
fetch https://dl-cdn.alpinelinux.org/alpine/v3.19/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.19/community/x86_64/APKINDEX.tar.gz
(1/9) Installing ca-certificates (20241121-r1)
(2/9) Installing c-ares (1.27.0-r0)
(3/9) Installing libunistring (1.1-r2)
(4/9) Installing libidn2 (2.3.4-r4)
(5/9) Installing nghttp2-libs (1.58.0-r0)
(6/9) Installing libpsl (0.21.5-r0)
(7/9) Installing libcurl (8.12.1-r0)
(8/9) Installing pcre2 (10.42-r2)
(9/9) Installing git (2.43.6-r0)
Executing busybox-1.36.1-r19.trigger
Executing ca-certificates-20241121-r1.trigger
OK: 48 MiB in 72 packages
- Let's download dependencies, compile everything and install it:
$ podman exec ejabberd-ejabberd ejabberdctl module_install ejabberd_observer_cli
I'll download "observer_cli" using git because I can't use Mix to fetch from hex.pm:
Runtime terminating during boot ('cannot expand $RELEASE_LIB in bootfile')
Crash dump is being written to: /home/ejabberd/logs/erl_crash_20250619-215914.dump...done
I'll download "recon" using git because I can't use Mix to fetch from hex.pm:
Runtime terminating during boot ('cannot expand $RELEASE_LIB in bootfile')
Crash dump is being written to: /home/ejabberd/logs/erl_crash_20250619-215914.dump...done
Fetching dependency observer_cli: Cloning into 'observer_cli'...
Fetching dependency os_stats: Cloning into 'os_stats'...
Fetching dependency recon: Cloning into 'recon'...
Inlining: inline_size=24 inline_effort=150
Old inliner: threshold=0 functions=[{insert,2},{merge,2}]
Module ejabberd_observer_cli has been installed.
Now you can configure it in your ejabberd.yml
I'll download "observer_cli" using git because I can't use Mix to fetch from hex.pm:
Runtime terminating during boot ('cannot expand $RELEASE_LIB in bootfile')
- It showed a few error messages, but in reality all the 29 files are correctly installed:
$ podman exec ejabberd-ejabberd ls .ejabberd-modules/ejabberd_observer_cli/ebin | wc -l
29
- Let's start an erlang shell attached to the running ejabberd node, and then start
ejabberd_observer_cli:
$ podman exec -it ejabberd-ejabberd ejabberdctl debug
Erlang/OTP 26 [erts-14.2.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [jit]
Eshell V14.2.1 (press Ctrl+G to abort, type help(). for help)
(ejabberd@localhost)1> ejabberd_observer_cli:start().
- This cleaned the shell window and display a few ejabberd statistics. Now press
HthenEnterto view the main erlang statistics, and let's hope you get some clue about what is consuming so much memory in your server.