ejabberd icon indicating copy to clipboard operation
ejabberd copied to clipboard

archive_msg.DAT grows without more data

Open luke-jr opened this issue 3 years ago • 8 comments

Environment

  • ejabberd version: 20.04
  • Erlang version: Erlang (SMP,ASYNC_THREADS) (BEAM) emulator version 12.2
  • OS: Linux (Gentoo)
  • Installed from: Gentoo package (source)

Configuration

hosts:
...
loglevel: info
log_rotate_size: 10485760
log_rotate_count: 5
certfiles:
  - /etc/ssl/ejabberd/server.pem
listen:
  -
    port: 5222
    ip: "::"
    module: ejabberd_c2s
    max_stanza_size: 262144
    shaper: c2s_shaper
    access: c2s
    starttls_required: true
  -
    port: 5269
    ip: "::"
    module: ejabberd_s2s_in
    max_stanza_size: 524288
  -
    port: 5443
    ip: "::"
    module: ejabberd_http
    tls: true
    request_handlers:
      /admin: ejabberd_web_admin
      /api: mod_http_api
      /bosh: mod_bosh
      /captcha: ejabberd_captcha
      /upload: mod_http_upload
      /ws: ejabberd_http_ws
  -
    port: 5280
    ip: "::"
    module: ejabberd_http
    request_handlers:
      /admin: ejabberd_web_admin
      /.well-known/acme-challenge: ejabberd_acme
  -
    port: 3478
    transport: udp
    module: ejabberd_stun
    use_turn: true
  -
    port: 1883
    ip: "::"
    module: mod_mqtt
    backlog: 1000
s2s_use_starttls: optional
acl:
  admin:
...
  local:
    user_regexp: ""
  loopback:
    ip:
      - 127.0.0.0/8
      - ::1/128
access_rules:
  local:
    allow: local
  c2s:
    deny: blocked
    allow: all
  announce:
    allow: admin
  configure:
    allow: admin
  muc_create:
    allow: local
  pubsub_createnode:
    allow: local
  trusted_network:
    allow: loopback
api_permissions:
  "console commands":
    from:
      - ejabberd_ctl
    who: all
    what: "*"
  "admin access":
    who:
      access:
        allow:
          acl: loopback
          acl: admin
      oauth:
        scope: "ejabberd:admin"
        access:
          allow:
            acl: loopback
            acl: admin
    what:
      - "*"
      - "!stop"
      - "!start"
  "public commands":
    who:
      ip: 127.0.0.1/8
    what:
      - status
      - connected_users_number
shaper:
  normal: 1000
  fast: 50000
shaper_rules:
  max_user_sessions: 10
  max_user_offline_messages:
    5000: admin
    100: all
  c2s_shaper:
    none: admin
    normal: all
  s2s_shaper: fast
modules:
  mod_adhoc: {}
  mod_admin_extra: {}
  mod_announce:
    access: announce
  mod_avatar: {}
  mod_blocking: {}
  mod_bosh: {}
  mod_caps: {}
  mod_carboncopy: {}
  mod_client_state: {}
  mod_configure: {}
  mod_disco: {}
  mod_fail2ban: {}
  mod_http_api: {}
  mod_http_upload:
    put_url: https://@HOST@:5443/upload
  mod_last: {}
  mod_mam:
    assume_mam_usage: true
    default: always
  mod_mqtt: {}
  mod_muc:
    access:
      - allow
    access_admin:
      - allow: admin
    access_create: muc_create
    access_persistent: muc_create
    access_mam:
      - allow
    default_room_options:
      mam: true
  mod_muc_admin: {}
  mod_offline:
    access_max_user_messages: max_user_offline_messages
  mod_ping: {}
  mod_privacy: {}
  mod_private: {}
  mod_proxy65:
    access: local
    max_connections: 5
  mod_pubsub:
    access_createnode: pubsub_createnode
    plugins:
      - flat
      - pep
    force_node_config:
      storage:bookmarks:
        access_model: whitelist
  mod_push: {}
  mod_push_keepalive: {}
  mod_register:
    ip_access: trusted_network
    welcome_message:
      subject: "Welcome!"
      body: |-
...
    registration_watchers:
...
    access_remove: none
  mod_roster:
    versioning: true
  mod_s2s_dialback: {}
  mod_shared_roster: {}
  mod_stream_mgmt:
    resend_on_timeout: if_offline
  mod_stun_disco: {}
  mod_vcard: {}
  mod_vcard_xupdate: {}
  mod_version:
    show_os: false
  mod_muc_log:
    file_permissions:
      mode: 600
    outdir: /var/log/jabber/muc
    timezone: universal
auth_method: internal
host_config:
...

Bug description

In the past day or so, archive_msg.DAT has suddenly grown by about 1 GB. Initially, I figured it was normal, and I would just have to migrate to a SQL db, but when I run ejabberdctl --no-timeout export2sql gramaton.org /tmp/dump.sql, it produces a 2.6 MB file. So it seems there's something bloating the file that doesn't show up in exported data...

luke-jr avatar Aug 01 '22 02:08 luke-jr

In the ejabberd Web Admin -> Nodes -> your node -> Database you can view how many elements are right now stored in each mnesia table, and how much memory (measured in whatever unit mnesia:info(). gets back). You can also view the elements.

For a similar purpose, use the dump_table command

One wild idea to explain that: MAM content was generated, introduced in the mnesia table... later those messages were removed, or their accounts removed (which removes their MAM messages), but they remain in cache until garbage collection is ran.

In that case, you can force the garbage collect.

badlop avatar Aug 02 '22 10:08 badlop

In the ejabberd Web Admin -> Nodes -> your node -> Database you can view how many elements are right now stored in each mnesia table, and how much memory (measured in whatever unit mnesia:info(). gets back). You can also view the elements.

Elements = 271,093 Memory = 243,021,880

I don't see a way to view elements.

One wild idea to explain that: MAM content was generated, introduced in the mnesia table... later those messages were removed, or their accounts removed (which removes their MAM messages), but they remain in cache until garbage collection is ran.

Unlikely - none of it should have been deleted ever, and it's growing rapidly. There have been no new or deleted accounts either.

The GC API seems more complex than simply changing the web admin URI. :/

luke-jr avatar Aug 03 '22 01:08 luke-jr

The GC API seems more complex than simply changing the web admin URI. :/

Well, I just proposed to call ejabberdctl gc, that may clean obsolete content in the database.

Is this problem still present?

badlop avatar Sep 21 '22 10:09 badlop

Well, I just proposed to call ejabberdctl gc, that may clean obsolete content in the database.

No change to archive_msg.DAT file size from that.

Is this problem still present?

It's still 1.3 GB, no idea why it started and stopped growing.

luke-jr avatar Sep 22 '22 01:09 luke-jr

I don't see a way to view elements.

Ah right. You are using ejabberd 20.04, but that feature was added in 21.07 https://github.com/processone/ejabberd/commit/16af8a47396e3889e9a0d63e503a36abe1daaa56

Alternatively, if the table contains almost useless data (or you have a table backup), you could try to delete its content manually, or even delete the table completely (ejabberd creates it again after restart if it doesn't exist).

badlop avatar Nov 15 '22 12:11 badlop

Actually, I'm on 22.05 now :)

But clicking the number of elements caused ejabberd to crash:

2022-11-15 20:34:50.244358+00:00 [error] <0.5582.2>@proc_lib:crash_report/4:539 CRASH REPORT:
  crasher:
    initial call: ejabberd_http:init/3
    pid: <0.5582.2>
    registered_name: []
    exception error: no function clause matching lists:sort({badrpc,timeout}) (lists.erl, line 512)
      in function  ejabberd_web_admin:get_table_content/5 (src/ejabberd_web_admin.erl, line 1870)
      in call from ejabberd_web_admin:make_table_elements_view/4 (src/ejabberd_web_admin.erl, line 1839)
      in call from ejabberd_web_admin:process_admin/3 (src/ejabberd_web_admin.erl, line 536)
      in call from ejabberd_http:process/2 (src/ejabberd_http.erl, line 373)
      in call from ejabberd_http:process_request/1 (src/ejabberd_http.erl, line 496)
      in call from ejabberd_http:process_header/2 (src/ejabberd_http.erl, line 293)
      in call from ejabberd_http:parse_headers/1 (src/ejabberd_http.erl, line 218)
    ancestors: [ejabberd_http_sup,ejabberd_sup,<0.121.0>]
    message_queue_len: 0
    messages: []
    links: [<0.797.0>,#Port<0.23421>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 6772
    stack_size: 28
    reductions: 40986
  neighbours:

2022-11-15 20:34:53.472837+00:00 [error] <0.797.0>@supervisor:do_restart/3:751 SUPERVISOR REPORT:
    supervisor: {local,ejabberd_http_sup}
    errorContext: child_terminated
    reason: {function_clause,
                [{lists,sort,
                     [{badrpc,timeout}],
                     [{file,"lists.erl"},{line,512}]},
                 {ejabberd_web_admin,get_table_content,5,
                     [{file,"src/ejabberd_web_admin.erl"},{line,1870}]},
                 {ejabberd_web_admin,make_table_elements_view,4,
                     [{file,"src/ejabberd_web_admin.erl"},{line,1839}]},
                 {ejabberd_web_admin,process_admin,3,
                     [{file,"src/ejabberd_web_admin.erl"},{line,536}]},
                 {ejabberd_http,process,2,
                     [{file,"src/ejabberd_http.erl"},{line,373}]},
                 {ejabberd_http,process_request,1,
                     [{file,"src/ejabberd_http.erl"},{line,496}]},
                 {ejabberd_http,process_header,2,
                     [{file,"src/ejabberd_http.erl"},{line,293}]},
                 {ejabberd_http,parse_headers,1,
                     [{file,"src/ejabberd_http.erl"},{line,218}]}]}
    offender: [{pid,<0.5582.2>},
               {id,undefined},
               {mfargs,{ejabberd_http,start_link,undefined}},
               {restart_type,temporary},
               {significant,false},
               {shutdown,5000},
               {child_type,worker}]

luke-jr avatar Nov 15 '22 20:11 luke-jr

Timeout when getting all the keys in a table... probably it really has many elements, and this webadmin page has no pagination or any other clever method to handle big tables.

The question remains: what are the table contents, how did it grow, and why does export2sql produce only a 2.6 MB file

I don't see a way to view elements.

As WebAdmin can't help here, you can try to dump the mnesia table to a text file.

badlop avatar Nov 16 '22 12:11 badlop

Any new information about this problem?

badlop avatar Jan 10 '23 09:01 badlop