ejabberd icon indicating copy to clipboard operation
ejabberd copied to clipboard

ejabberd do not start after update from 23.10 to 24.02

Open giostark opened this issue 10 months ago • 5 comments

Hi all :-) A problem similar to this : https://github.com/processone/ejabberd/issues/3150 That unfortunately do not come resolved. (the memsup was different from my epam). If I understood properly the database come updated itself. I'm using the included database. After the mentioned update ejabberd do not start anymore with the following errors. I don know what to do... really... Please ask me all the necessary info (and how to gather them) for investigate this issue. (I tried to entirely remove the firewall but the problem persist) The server were working fine , I launched the new install (24.02) , rebooted , the issue. I'm an Arch user... Thanks at any good soul will find the time to support me ;-)

2024-04-13 13:47:50.742784+02:00 [info] <0.190.0>@ejabberd_config:load/1:82 Loading configuration from /opt/ejabberd/conf/ejabberd.yml
2024-04-13 13:47:51.785172+02:00 [info] <0.190.0>@ejabberd_config:load/1:89 Configuration loaded successfully
2024-04-13 13:52:49.244505+02:00 [notice] <0.70.0> SIGTERM received - shutting down


2024-04-13 14:01:26.057330+02:00 [error] <0.214.0> External eimp process (pid=3826) has terminated unexpectedly, restarting in a few seconds
2024-04-13 14:01:26.058274+02:00 [error] <0.216.0> External eimp process (pid=3827) has terminated unexpectedly, restarting in a few seconds
2024-04-13 14:01:26.058452+02:00 [error] <0.212.0> External eimp process (pid=3825) has terminated unexpectedly, restarting in a few seconds
2024-04-13 14:01:26.058905+02:00 [error] <0.210.0> External eimp process (pid=3824) has terminated unexpectedly, restarting in a few seconds
2024-04-13 14:01:26.064626+02:00 [error] <0.222.0>@gen_server:error_info/8:1391 ** Generic server epam terminating 
** Last message in was {#Port<0.10>,{exit_status,143}}
** When Server state == {state,#Port<0.10>}
** Reason for termination ==
** port_died

2024-04-13 14:01:26.065731+02:00 [error] <0.222.0>@proc_lib:crash_report/4:584 CRASH REPORT:
  crasher:
    initial call: epam:init/1
    pid: <0.222.0>
    registered_name: epam
    exception exit: port_died
      in function  gen_server:handle_common_reply/8 (gen_server.erl, line 1226)
    ancestors: [epam_sup,<0.220.0>]
    message_queue_len: 0
    messages: []
    links: [<0.221.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 4185
    stack_size: 28
    reductions: 7829
  neighbours:

2024-04-13 14:01:26.080804+02:00 [error] <0.221.0>@supervisor:do_restart/3:744 SUPERVISOR REPORT:
    supervisor: {local,epam_sup}
    errorContext: child_terminated
    reason: port_died
    offender: [{pid,<0.222.0>},
               {id,epam},
               {mfargs,{epam,start_link,[]}},
               {restart_type,permanent},
               {significant,false},
               {shutdown,brutal_kill},
               {child_type,worker}]

giostark avatar Apr 13 '24 12:04 giostark

This is the status of the service. Seems that it stay pending the connection. I don't know how to investigate that "epam" that may cause this stuck. And do not seems related with other "epam port died" I found on the net.

[arch-nas@arch-nas ~]$ sudo systemctl status ejabberd
● ejabberd.service - XMPP Server
     Loaded: loaded (/etc/systemd/system/ejabberd.service; enabled; preset: disabled)
     Active: activating (start) since Sat 2024-04-13 20:48:18 CEST; 4min 24s ago
   Main PID: 2353 (ejabberdctl)
      Tasks: 34 (limit: 9378)
     Memory: 739.5M (peak: 907.0M)
        CPU: 5min 2.304s
     CGroup: /system.slice/ejabberd.service
             ├─2353 /bin/sh /opt/ejabberd-24.02/bin/ejabberdctl foreground
             ├─2366 /opt/ejabberd-24.02/erts-14.2.2/bin/beam.smp -K true -P 250000 -- -root /opt/ejabberd-24.02 -bindir /opt/ejabberd-24.02/erts-14.2.2/bin -progname erl -- -h>
             ├─2378 /opt/ejabberd-24.02/erts-14.2.2/bin/epmd -daemon
             ├─2382 erl_child_setup 65536
             ├─2407 /opt/ejabberd-24.02/lib/eimp-1.0.22/priv/bin/eimp
             ├─2408 /opt/ejabberd-24.02/lib/eimp-1.0.22/priv/bin/eimp
             ├─2409 /opt/ejabberd-24.02/lib/eimp-1.0.22/priv/bin/eimp
             ├─2410 /opt/ejabberd-24.02/lib/eimp-1.0.22/priv/bin/eimp
             ├─2411 /opt/ejabberd-24.02/lib/epam-1.0.14/priv/bin/epam
             ├─2412 /opt/ejabberd-24.02/erts-14.2.2/bin/inet_gethost 4
             └─2413 /opt/ejabberd-24.02/erts-14.2.2/bin/inet_gethost 4

Apr 13 20:48:21 arch-nas ejabberdctl[2366]:               {child_type,supervisor}]
Apr 13 20:48:21 arch-nas ejabberdctl[2366]: 2024-04-13 20:48:21.216010+02:00 [debug] PROGRESS REPORT:
Apr 13 20:48:21 arch-nas ejabberdctl[2366]:     supervisor: {local,kernel_safe_sup}
Apr 13 20:48:21 arch-nas ejabberdctl[2366]:     started: [{pid,<0.404.0>},
Apr 13 20:48:21 arch-nas ejabberdctl[2366]:               {id,disk_log_server},
Apr 13 20:48:21 arch-nas ejabberdctl[2366]:               {mfargs,{disk_log_server,start_link,[]}},
Apr 13 20:48:21 arch-nas ejabberdctl[2366]:               {restart_type,permanent},
Apr 13 20:48:21 arch-nas ejabberdctl[2366]:               {significant,false},
Apr 13 20:48:21 arch-nas ejabberdctl[2366]:               {shutdown,2000},
Apr 13 20:48:21 arch-nas ejabberdctl[2366]:               {child_type,worker}]

giostark avatar Apr 13 '24 18:04 giostark

At "sudo journalctl -xeu ejabberd.service" this is the output:

[arch-nas@arch-nas ~]$ sudo journalctl -xeu ejabberd.service
[sudo] password for arch-nas: 
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {significant,false},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {shutdown,3000},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {child_type,worker}]
Apr 13 20:58:34 arch-nas ejabberdctl[2870]: 2024-04-13 20:58:34.647473+02:00 [debug] PROGRESS REPORT:
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:     supervisor: {local,mnesia_kernel_sup}
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:     started: [{pid,<0.396.0>},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {id,mnesia_subscr},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {mfargs,{mnesia_subscr,start,[]}},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {restart_type,permanent},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {significant,false},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {shutdown,3000},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {child_type,worker}]
Apr 13 20:58:34 arch-nas ejabberdctl[2870]: 2024-04-13 20:58:34.661363+02:00 [debug] PROGRESS REPORT:
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:     supervisor: {local,mnesia_kernel_sup}
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:     started: [{pid,<0.397.0>},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {id,mnesia_locker},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {mfargs,{mnesia_locker,start,[]}},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {restart_type,permanent},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {significant,false},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {shutdown,3000},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {child_type,worker}]
Apr 13 20:58:34 arch-nas ejabberdctl[2870]: 2024-04-13 20:58:34.670528+02:00 [debug] PROGRESS REPORT:
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:     supervisor: {local,mnesia_kernel_sup}
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:     started: [{pid,<0.398.0>},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {id,mnesia_recover},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {mfargs,{mnesia_recover,start,[]}},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {restart_type,permanent},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {significant,false},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {shutdown,180000},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {child_type,worker}]
Apr 13 20:58:34 arch-nas ejabberdctl[2870]: 2024-04-13 20:58:34.752914+02:00 [debug] PROGRESS REPORT:
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:     supervisor: {local,kernel_safe_sup}
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:     started: [{pid,<0.403.0>},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {id,disk_log_sup},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {mfargs,{disk_log_sup,start_link,[]}},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {restart_type,permanent},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {significant,false},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {shutdown,1000},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {child_type,supervisor}]
Apr 13 20:58:34 arch-nas ejabberdctl[2870]: 2024-04-13 20:58:34.753550+02:00 [debug] PROGRESS REPORT:
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:     supervisor: {local,kernel_safe_sup}
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:     started: [{pid,<0.404.0>},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {id,disk_log_server},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {mfargs,{disk_log_server,start_link,[]}},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {restart_type,permanent},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {significant,false},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {shutdown,2000},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {child_type,worker}]

giostark avatar Apr 13 '24 19:04 giostark

I tried to substitute the "/opt/ejabberd-24.02/lib/epam-1.0.14/priv/bin/epam" with the old one 1.0.12 from the 23.10 installation but of course it do not worked ...

giostark avatar Apr 13 '24 19:04 giostark

As you use the internal (mnesia) database, I guess you made a backup of its files before updating the program.

What happens if you uninstall ejabberd 24.02, recover back the old mnesia directory, install ejabberd 23.10 and start it: does it start correctly?

badlop avatar Apr 15 '24 15:04 badlop

Hi Lop ! Thanks for the patience to read all this :-) I try to explain as I do operate with my hardware. Maybe is useful... I have an Asustor NAS in which I installed Arch (instead of using the stock software) I have two MX500 1TB (and other two disk but not relevant in this case). The main disk is just one , the second is passive. (not auto cloning software or similar stuff) . From time to time I update the system , reboot the machine to check everything is working and then update the ejabberd using the .run file. I install it as root and it place all the files in opt. If the update of jabber end fine and it work without errors of any kind I power off the machine , using a USB installation pen of Arch I clone with dd the entire first disk to the second disk. Like this I have a working backup but not subjected at those kind of errors. And right now I'm running this backup with the 23.10 installed ! I'm using this backup made nearly one month ago. On the first disk I tried to install and configure prosody and it after some hours of learning curve (i'm a noob) ended working fine but of course with its limitation for some defective modules.

I tried as you suggested and the errors became like this:


2024-04-17 22:46:15.105809+02:00 [error] <0.457.0>@proc_lib:crash_report/4:584 CRASH REPORT:
  crasher:
    initial call: ejabberd_commands:init/1
    pid: <0.457.0>
    registered_name: []
    exception exit: {aborted,{node_not_running,ejabberd@localhost}}
      in function  mnesia:abort/1 (mnesia.erl, line 360)
      in call from ejabberd_commands:init/1 (src/ejabberd_commands.erl, line 111)
      in call from gen_server:init_it/2 (gen_server.erl, line 962)
      in call from gen_server:init_it/6 (gen_server.erl, line 917)
    ancestors: [ejabberd_sup,<0.179.0>]
    message_queue_len: 0
    messages: []
    links: [<0.384.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 610
    stack_size: 28
    reductions: 206
  neighbours:

2024-04-17 22:46:15.107700+02:00 [error] <0.384.0>@supervisor:start_children/2:398 SUPERVISOR REPORT:
    supervisor: {local,ejabberd_sup}
    errorContext: start_error
    reason: {aborted,{node_not_running,ejabberd@localhost}}
    offender: [{pid,undefined},
               {id,ejabberd_commands},
               {mfargs,{ejabberd_commands,start_link,[]}},
               {restart_type,permanent},
               {significant,false},
               {shutdown,60000},
               {child_type,worker}]

2024-04-17 22:46:15.120800+02:00 [debug] <0.385.0>@ejabberd_systemd:terminate/2:133 Terminating ejabberd_systemd (shutdown)
2024-04-17 22:46:15.121057+02:00 [debug] <0.385.0>@ejabberd_systemd:cancel_timer/1:177 Cancelling watchdog timer
2024-04-17 22:46:15.121569+02:00 [critical] <0.179.0>@ejabberd_app:start/2:68 Failed to start ejabberd application: {error,
                                       {shutdown,
                                        {failed_to_start_child,
                                         ejabberd_commands,
                                         {aborted,
                                          {node_not_running,
                                           ejabberd@localhost}}}}}

I also performed an entire OPT substitution but it still fail to start. Boh , I don't know if something is gone corrupted during the last update and have damaged some libraries . Everything appeared flawless and the system appear to work fine. (I switched to the first disk for all the tests and I'm writing from it) . I tried also to reinstall all the apps and libraries mentioned in the dependencies of ejabberd (as if I would install it from source). I tried to install it also from source and it fail again. I fear to touch this disk that work. But , I can operate as follow: Use dd to backup again the old working disk. Despite the regression I'll have two working disk and then operate again the update of everything. But this time with the option to bump a copy of the database. (to a mysql extension for example). I'll proceed like this and I will report the result :-)

giostark avatar Apr 17 '24 21:04 giostark

Hey. I know this may have come in late, but I experienced a somewhat similar issue with ejabberd v24 . The systemctl service would hang indefinitely when I started it. There was no helpful message inside /var/log/ejabberd/ejabberd.log.

I resorted to downgrading the package back to 23.10-1. I was met with an error about a dependency, erlang-unixodbc when I tried to install the older version. I was able to get it working by doing the following:

  1. Uninstalling the latest version of ejabberd.
  2. Installing the most recent (before running pacman -Syu) erlang-unixodbc - v26.2.3-1from the pacman cache folder.
  3. installing the most recent version of (before running pacman -Syu) ejabberd - v23.10-1 from the pacman cache folder.

I hope this helps in some way. Here are a few links that helped me solve the problem:

japodhidev avatar Jul 11 '24 08:07 japodhidev

Should we close this ? This does not seems to be an ejabberd / Erlang issue, but an issue with the build env ?

mremond avatar Jul 15 '24 13:07 mremond

I added commit https://github.com/processone/ejabberd/commit/7c63cd1000f4f0b8c507baab34fb0d38512ad16e that should at least report error message when one of our apps that we have listed as our dependency (as i guess happens here with odbc) prevents ejabberd from starting, so at least we shouldn't not abort silently when this happens.

prefiks avatar Jul 15 '24 13:07 prefiks