ejabberd icon indicating copy to clipboard operation
ejabberd copied to clipboard

MUC rooms are not replicated on a cluster

Open joudinet opened this issue 7 years ago • 6 comments

What version of ejabberd are you using? 17.04

What operating system (version) are you using? Buildroot 2017.08

How did you install ejabberd (source, package, distribution)? package

What did not work as expected? Are there error messages in the log? What was the unexpected behavior? What was the expected result?

I have a cluster of three ejabberd nodes (on three identical machines). I join/create a permanent MUC room from node 1. Note: I configure default_room_options to make rooms permanent by default. I can see/join the room from all three nodes. I shutdown node 2. Room is still visible (i.e., not empty) from both node 1 and node 3. If i shutdown node 1 (where I created the room), the room is empty if I join later from node 3.

I try to reduce single point of failure in my system. So, I'd like the (permanent) rooms to be duplicated on every nodes, such that if I shudown any of them, the room is still here. Any idea how to do that?

joudinet avatar Feb 07 '18 16:02 joudinet

This is a well recognized issue and it's not implemented, and I'm not sure it will be implemented in the community edition.

zinid avatar Feb 07 '18 16:02 zinid

@zinid: Thanks for the quick reply and confirming it is a known issue. I let the issue open.

joudinet avatar Feb 07 '18 17:02 joudinet

I'm not sure if this is the same issue, but I have an issue where of my three clients only one can successfully use MUC chatrooms on our cluster of 4 ejabberd nodes. The other two [conversations on Android] always return you are no longer part of this group chat or resource contraint. This happens both whether the other clients are connected to the same node as the primary client I use, and when they happen to connect to different nodes. The variability is given by the round-robin configuration I have [see below], but, then again, the issue happens in either way.

The way I have this setup is:

ae00.domain.xyz [MUC configured as xc.domain.xyz in this node's /etc/ejabberd/ejabberd.yml]
ae01.domain.xyz [MUC configured as xc.domain.xyz in this node's /etc/ejabberd/ejabberd.yml]
ae02.domain.xyz [MUC configured as xc.domain.xyz in this node's /etc/ejabberd/ejabberd.yml]
ae03.domain.xyz [MUC configured as xc.domain.xyz in this node's /etc/ejabberd/ejabberd.yml]
ae.domain.xyz -> DNS roundrobin to all 4 nodes via 4 A and 4 AAAA records
xc.domain.xyz -> DNS roundrobin to all 4 nodes via 4 A and 4 AAAA records

This is different from all other services: mod_echo, mod_irc, mod_mix, mod_pubsub, and mod_uploads, that are all configured node-specifically:

ae00.domain.xyz [e.g.: mode_echo configured as xe00.domain.xyz in this node's /etc/ejabberd/ejabberd.yml]
ae01.domain.xyz [e.g.: mode_echo configured as xe01.domain.xyz in this node's /etc/ejabberd/ejabberd.yml]
ae02.domain.xyz [e.g.: mode_echo configured as xe02.domain.xyz in this node's /etc/ejabberd/ejabberd.yml]
ae03.domain.xyz [e.g.: mode_echo configured as xe03.domain.xyz in this node's /etc/ejabberd/ejabberd.yml]
xe00.domain.xyz -> CNAME to ae00.domain.xyz
xe01.domain.xyz -> CNAME to ae01.domain.xyz
xe02.domain.xyz -> CNAME to ae02.domain.xyz
xe03.domain.xyz -> CNAME to ae03.domain.xyz

This is because a chatroom must be the same globally [e.g.: [email protected]] and cannot be fragmented into node-specific MUC IDs [like: [email protected]]. The fact is that this is pure trial-and-error on my part and I have no idea as to whether this is actually how to clusterize ejabberd, but it's the only logical conclusion for me and the complete lack of documentation on the cluster KB page doesn't help. Thank to all in advance.

nordurljosahvida avatar Sep 01 '18 17:09 nordurljosahvida

I'm not sure if this is the same issue, but I have an issue where of my three clients only one can successfully use MUC chatrooms on our cluster of 4 ejabberd nodes.

This is not related. I think your case is a wrong design / architecture / configuration of your MUC service.

mremond avatar Apr 15 '21 14:04 mremond

Thanks for the quick reply and confirming it is a known issue. I let the issue open.

Well, this would be a different MUC implementation for a different purpose. It would required not only a clustered MUC service, but to make each room work in cluster. It would also have some drawback regarding scalability. If you can fit 100k room in memory on a 3 nodes cluster for example, you would now have to fit 300k room basically.

This is a tradeoff for different use cases.

I feel that the default behaviour hit a right balance for the community version. To be honest clustered / fault tolerant MUC seems more a feature for large service providers.

mremond avatar Apr 15 '21 14:04 mremond

The state data of muc room is stored in local memory, so do not support cluster feature..

Our simple solution is load the room into current node when we found that room is not exists among all nodes.

To re-join the room, you need to route the presence packet of room members to that room: mod_muc_room:route(Pid, #presence(type=available, from=<<"member@xxx">>, to=<<"room@xxx">>));

dingdongnigetou avatar Jul 19 '22 08:07 dingdongnigetou