Some Rooms crash but still PID in mnesia
Environment
- ejabberd version: 20.04 , 20.07
- Erlang version:
OTP 21.2.7 - 23.0.3 - OS: Linux (Centos8)
- Installed from: source
Errors from error.log/crash.log
2020-08-03 02:23:24 =SUPERVISOR REPORT====
Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
Context: child_terminated
Reason: killed
Offender: [{pid,<0.1148.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]
2020-08-03 02:27:25 =SUPERVISOR REPORT====
Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
Context: child_terminated
Reason: killed
Offender: [{pid,<0.656.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]
2020-08-03 02:28:55 =SUPERVISOR REPORT====
Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
Context: child_terminated
Reason: killed
Offender: [{pid,<0.658.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]
2020-08-03 02:32:26 =SUPERVISOR REPORT====
Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
Context: child_terminated
Reason: killed
Offender: [{pid,<0.659.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]
2020-08-03 02:40:58 =SUPERVISOR REPORT====
Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
Context: child_terminated
Reason: killed
Offender: [{pid,<0.959.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]
2020-08-03 02:42:29 =SUPERVISOR REPORT====
Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
Context: child_terminated
Reason: killed
Offender: [{pid,<0.709.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]
2020-08-03 02:54:02 =SUPERVISOR REPORT====
Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
Context: child_terminated
Reason: killed
Offender: [{pid,<0.667.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]
2020-08-03 03:27:40 =SUPERVISOR REPORT====
Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
Context: child_terminated
Reason: killed
Offender: [{pid,<0.1238.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]
2020-08-03 04:05:48 =SUPERVISOR REPORT====
Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
Context: child_terminated
Reason: killed
Offender: [{pid,<0.1428.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]
2020-08-03 04:14:20 =SUPERVISOR REPORT====
Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
Context: child_terminated
Reason: killed
Offender: [{pid,<0.1459.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]
2020-08-03 06:04:49 =SUPERVISOR REPORT====
Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
Context: child_terminated
Reason: killed
Offender: [{pid,<0.1077.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]
2020-08-03 06:04:49 =SUPERVISOR REPORT====
Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
Context: child_terminated
Reason: killed
Offender: [{pid,<0.1306.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]
2020-08-03 06:35:28 =SUPERVISOR REPORT====
Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
Context: child_terminated
Reason: killed
Offender: [{pid,<0.679.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]
2020-08-03 06:50:32 =SUPERVISOR REPORT====
Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
Context: child_terminated
Reason: killed
Offender: [{pid,<0.663.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]
2020-08-03 07:05:36 =SUPERVISOR REPORT====
Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
Context: child_terminated
Reason: killed
Offender: [{pid,<0.666.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]
2020-08-03 07:13:09 =SUPERVISOR REPORT====
Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
Context: child_terminated
Reason: killed
Offender: [{pid,<0.705.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]
Bug description
Hello, When trying to get Config of the room it returns {error,not_found} but the room PID stills in Mnesia . The problem occurs at some point when some flooders make the room crash. I have been struggling from a week but I did not find any solution. I have tried OTP 21.2.7 and OTP 23.0.3 . Now most of Rooms dissapeard from list but stills in Mysql database. I have to restart the whole server for rooms to show again but after sometime crashes some rooms again
The problem is easy to reproduce:
- Create a new room, for example joining room1
- In Erlang shell, execute something like:
exit(element(2, mod_muc:find_online_room(<<"room1">>, <<"conference.localhost">>)), kill). - It shows an error message like what you mentioned...
- but nothing else! The mod_muc service is not aware of the room crash, the muc_online_room Mnesia table still mentions the room, and the occupants were not informed about the room crash.
The problem is easy to reproduce:
- Create a new room, for example joining room1
- In Erlang shell, execute something like:
exit(element(2, mod_muc:find_online_room(<<"room1">>, <<"conference.localhost">>)), kill).- It shows an error message like what you mentioned...
- but nothing else! The mod_muc service is not aware of the room crash, the muc_online_room Mnesia table still mentions the room, and the occupants were not informed about the room crash.
The shell return just "true" and nothing else. No crash log
I have determined what causes room crash. When room captcha enabled, Some users use tools to request too many captcha . after a few minutes the room crashes permanently until restart the server.
Did you find in the logs any specific error lines related to those captcha crashes? Or the only thing found in the logs are the lines you already copied in the ticket description?
Did you find in the logs any specific error lines related to those captcha crashes? Or the only thing found in the logs are the lines you already copied in the ticket description?
No. Just only the lines copied in ticket description
@badlop I think we should rate limit the captcha generation from user to mitigate this issue.