ejabberd icon indicating copy to clipboard operation
ejabberd copied to clipboard

occupant-id not stable in some cases

Open lovetox opened this issue 1 year ago • 13 comments

Environment

  • ejabberd version: 24.02

Bug description

I have 2 different users in Gajim that reported a problem with occupant-id in 2 different MUCs on different servers.

When we investigated we found that the occupant-id in the presence is different from the occupant-id the server adds to their messages.

<message xmlns="jabber:client" to="[email protected]" type="groupchat" id="8dcee6be-1769-4d7d-acbf-1e545b7a950f">
  <body>test</body>
  <origin-id xmlns="urn:xmpp:sid:0" id="8dcee6be-1769-4d7d-acbf-1e545b7a950f" />
</message>

<message xmlns="jabber:client" xml:lang="en" to="[email protected]/gajim.J39W82BR" from="[email protected]/myname" type="groupchat" id="8dcee6be-1769-4d7d-acbf-1e545b7a950f">
  <archived xmlns="urn:xmpp:mam:tmp" by="[email protected]" id="1719035916432908" />
  <stanza-id xmlns="urn:xmpp:sid:0" by="[email protected]" id="1719035916432908" />
  <origin-id xmlns="urn:xmpp:sid:0" id="8dcee6be-1769-4d7d-acbf-1e545b7a950f" />
  <occupant-id xmlns="urn:xmpp:occupant-id:0" id="g20AAAAgAozSRwPfZaleTLJ0hNM73x5BBMGnB2HfuA12Mm9MpNU=" />
  <body>test</body>
</message>

<presence xmlns="jabber:client" xml:lang="en" to="[email protected]/gajim.J39W82BR" from="[email protected]/myname" id="2aadd6b4-8c2a-4471-9b4f-3a098416a765">
  ...
  <occupant-id xmlns="urn:xmpp:occupant-id:0" id="g20AAAAgSp/6G+pFGi1tumup3r8yinT6Kk4yPZzoPGdFIzLrteI=" />
  <x xmlns="http://jabber.org/protocol/muc#user">
    <item jid="[email protected]/gajim.J39W82BR" role="participant" affiliation="none" />
    <status code="110" />
</x>
  <show>xa</show>
</presence>

Multiple device where joined in the MUC, if that could be a issue. We have no reproducible case for now, but it happens.

Could you please check the code, under which conditions this could happen?

lovetox avatar Jun 23 '24 07:06 lovetox

I played trying to reproduce the problem, but did not get it.

Does the problem appear from time to time in your server, even if you are not able to reproduce it voluntarily on demand?

In that case, you could apply a small patch to the modue that generates an ID that we can later decrypt, to determine what is the difference between the two ones

diff --git a/src/mod_muc_occupantid.erl b/src/mod_muc_occupantid.erl
index fabd69e28..42449e563 100644
--- a/src/mod_muc_occupantid.erl
+++ b/src/mod_muc_occupantid.erl
@@ -74,8 +74,8 @@ add_occupantid_packet(Packet, RoomJid) ->
     xmpp:set_subtag(Packet, OccupantElement).
 
 calculate_occupantid(From, RoomJid) ->
-    Term = {jid:remove_resource(From), get_salt(RoomJid)},
-    misc:term_to_base64(crypto:hash(sha256, io_lib:format("~p", [Term]))).
+    Term = {jid:remove_resource(From), RoomJid, get_salt(RoomJid)},
+    misc:term_to_base64(io_lib:format("~p", [Term])).
 
 %%%
 %%% Table storing rooms' salt

Then, the ID is a very long string like:

  <occupant-id id='g2wAAAABbAAAAANhe2wAAAAJbAAAAANhe2wAAAANawADamlkYSxsAAAABWE8YTxrAAcidXNlcjIiYT5hPmphLGwAAAAFYTxhPGsACyJsb2NhbGhvc3QiYT5hPmphLGsABDw8Pj5hLGwAAAAFYTxhPGsAByJ1c2VyMiJhPmE+amEsbAAAAAVhPGE8awALImxvY2FsaG9zdCJhPmE+amEsawAEPDw+PmphfWphLGEKawABIGwAAAADYXtsAAAAD2sAA2ppZGEsbAAAAAVhPGE8awAHInNhbGExImE+YT5qYSxsAAAABWE8YTxrABYiY29uZmVyZW5jZS5sb2NhbGhvc3QiYT5hPmphLGsABDw8Pj5hLGwAAAAFYTxhPGsAByJzYWxhMSJhPmE+amEsYQpsAAAAAmwAAAAEYSBrAAIgIGEgYSBqYSBqbAAAAAVhPGE8awAWImNvbmZlcmVuY2UubG9jYWxob3N0ImE+YT5qYSxrAAQ8PD4+amF9amEsYQprAAEgbAAAAAVhPGE8awAWIjE1OTU4NTQzNDU1MjgyNzcwNjEyImE+YT5qamF9amo='
	xmlns='urn:xmpp:occupant-id:0'/>

That can be easily decrypted to know what user, what room and what room salt were used:

misc:base64_to_term("g2wAAAABbAAAAANhe2wAAAAJbAAAAANhe2wAAAANawADamlkYSxsAAAABWE8YTxrAAcidXNlcjIiYT5hPmphLGwAAAAFYTxhPGsACyJsb2NhbGhvc3QiYT5hPmphLGsABDw8Pj5hLGwAAAAFYTxhPGsAByJ1c2VyMiJhPmE+amEsbAAAAAVhPGE8awALImxvY2FsaG9zdCJhPmE+amEsawAEPDw+PmphfWphLGEKawABIGwAAAADYXtsAAAAD2sAA2ppZGEsbAAAAAVhPGE8awAHInNhbGExImE+YT5qYSxsAAAABWE8YTxrABYiY29uZmVyZW5jZS5sb2NhbGhvc3QiYT5hPmphLGsABDw8Pj5hLGwAAAAFYTxhPGsAByJzYWxhMSJhPmE+amEsYQpsAAAAAmwAAAAEYSBrAAIgIGEgYSBqYSBqbAAAAAVhPGE8awAWImNvbmZlcmVuY2UubG9jYWxob3N0ImE+YT5qYSxrAAQ8PD4+amF9amEsYQprAAEgbAAAAAVhPGE8awAWIjE1OTU4NTQzNDU1MjgyNzcwNjEyImE+YT5qamF9amo=").

{term,[[123,
        [[123,
          ["jid",44,
           [60,60,"\"user2\"",62,62],
           44,
           [60,60,"\"localhost\"",62,62],
           44,"<<>>",44,
           [60,60,"\"user2\"",62,62],
           44,
           [60,60,"\"localhost\"",62,62],
           44,"<<>>"],
          125],
         44,10," ",
         [123,
          ["jid",44,
           [60,60,"\"sala1\"",62,62],
           44,
           [60,60,"\"conference.localhost\"",62,62],
           44,"<<>>",44,
           [60,60,"\"sala1\"",62,62],
           44,10,
           [[32,"  ",32,32],32],
           [60,60,"\"conference.localhost\"",62|...],
           44,"<<>>"],
          125],
         44,10," ",
         [60,60,"\"15958543455282770612\"",62,62]],
        125]]}

If you compile ejabberd from source code, simply apply the patch, recompile, install the file and restart the module or restart the whole ejabberd.

If you cannot compile ejabberd yourself, ping me and I'll generate an updated module *.beam file so you can install it.

badlop avatar Jun 24 '24 11:06 badlop

It could be that room was recreated between those two stanzas, that would change salt (which i think is expected). Only thing that's worth checking is if room hibernation doesn't trigger salt regeneration - it that case we don't want to do that.

Also Badlop was there any reason why you didn't want to put salt in muc_room state but used external mnesia table?

prefiks avatar Jun 24 '24 11:06 prefiks

Also Badlop was there any reason why you didn't want to put salt in muc_room state but used external mnesia table?

Yes, the reasoning was something like this:

  • The occupant id must be stable (the same id for the same bare user + room) https://xmpp.org/extensions/xep-0421.html#id-generation
  • room salt is a random string (cannot be reproduced, so it must be stored persistently as long as the room exists)
  • in mod_muc, the room state is lost when mod_muc stops, so we must use a persistent storage
  • the first design was to store the room salt in the mod_muc_room #config record. But this salt is used only by mod_muc_occupantid... keeping the data in a specific table managed by mod_muc_occupantid seems reasonable.

It could be that room was recreated between those two stanzas, that would change salt (which i think is expected).

Right, that could be the case here, and it makes sense that the occupantid changes, as the room is different (may be from a different creator, owner, ... both rooms just have the same roomname).

If that's what happened in this case, it will be revealed when there are new logs that have occupantid we can unencrypt to determine their source information.

Only thing that's worth checking is if room hibernation doesn't trigger salt regeneration - it that case we don't want to do that.

Aha, that could be the case here too, and in that case the module should be improved.

Looking at the code, a room salt is only removed when the room is removed (and the remove_room hook is called):

  • close room if temporary and empty
  • change config: remove persistence option when mnesia db
  • destroy room
  • restoring hibernated non-persistent room when mnesia db

Maybe the room is non-persistent, using mnesia storage, got hibernated and the module deleted its salt when got unhibernated. I'll check and update this comment.

badlop avatar Jun 24 '24 14:06 badlop

Another thing related to hibernation even if hibernation doesn't reset salt (and i think we call remove_room hook only when room is truly removed, and not just hibernated), could be restoring room on different node, i don't think that mnesia table is synced between nodes, so if room was restored on different node, which didn't have it, it will make new one.

But i think it most probably room that is not set to persistent, and it just get deleted when everyone leaves and then get recreated.

prefiks avatar Jun 24 '24 15:06 prefiks

A temporary (that is, not-persistent) room is deleted when the last occupant leaves and it has no subscribers.

I was able to reproduce the problem:

  • A temporary room looses the last occupant but still has some subscriber: it doesn't get destroyed.
  • After an inactivity as long as hibernation_time, it gets hibernated (erlang process gets destroyed, removed from muc_online_room, but room config is kept in muc_room).
  • Later when a user joins the room, the room gets restored from hibernation, an erlang process is started, but there is code in mod_muc that calls forget_room, which means that the room salt will be newly created.

I wonder what was the requirement to forget_room when restoring a temporary room.

Removing it solves this particular problem:

diff --git a/src/mod_muc.erl b/src/mod_muc.erl
index e297d866f..3184fe90a 100644
--- a/src/mod_muc.erl
+++ b/src/mod_muc.erl
@@ -903,14 +903,7 @@ load_room(RMod, Host, ServerHost, Room, ResetHibernationTime) ->
 		    Res2;
 		_ ->
 		    ?DEBUG("Restore hibernated non-persistent room: ~ts", [Room]),
-		    Res = start_room(RMod, Host, ServerHost, Room, Opts0),
-		    case erlang:function_exported(Mod, get_subscribed_rooms, 3) of
-			true ->
-			    ok;
-			_ ->
-			    forget_room(ServerHost, Host, Room)
-		    end,
-		    Res
+		    start_room(RMod, Host, ServerHost, Room, Opts0)
 	    end
     end.
 

badlop avatar Jun 24 '24 15:06 badlop

Hi,

i think its a fair assumption that a salt is regenerated after a room was destroyed.

the room in question here is a group chat with >60 people, and its configured as persistent (at least today), i think it unlikely that it was destroyed.

A bit more info, we could see the problem happening consistently, as in messages always had a different id than the presence. It was not a case of send a message on one day, and check the presence on the next day.

so both occupant-ids where present at the same time, not a case of occupant-id changed because of an event and from that moment on it was different everywhere.

I try to get more logs from the user, maybe we can see what the occupant-id was over a longer period of time.

lovetox avatar Jun 24 '24 16:06 lovetox

@badlop Ah, missed that forget room call. That call was there to delete entry from db, as we will not delete this otherwise, we probably should add function that just delete room from db, and have other that remove from db and calls that hook.

@lovetox Do you know if server where this is deployed use clustering or is that just single computer?

prefiks avatar Jun 24 '24 17:06 prefiks

its 404.city, we try to reach the admin, but until now no success

lovetox avatar Jun 28 '24 18:06 lovetox

EDIT: Sorry got this wrong the first time.

Seems like @prefiks found the reason

we are storing those in mnesia table, not sure if it's set to be stored on disk yes, table that is used by it only keeps that in memory yeah, i will see if we can change that

lovetox avatar Jun 28 '24 22:06 lovetox

This still happening, could it be a problem with mutli devices joined in the same MUC from the same user?

lovetox avatar Oct 18 '24 17:10 lovetox

the room in question here is a group chat with >60 people, and its configured as persistent (at least today), i think it unlikely that it was destroyed.

I've had this issue occur occasionally for a while and I do have two devices I connect with. But I just met someone else who has the same issue and apparently they only have one device that is connecting to the account they have the issue with.

UntrimmedParasail avatar Oct 21 '24 12:10 UntrimmedParasail

For reference, this is what it looks like on client side:

91a8a04f-2f86-4487-8d1a-b4aaf35302cb_df14a6ef4e629d45f42cc44125d5340957035440

UntrimmedParasail avatar Oct 21 '24 12:10 UntrimmedParasail

Looks like we can reproduce as well (this was in the DivestOS Mobile chat - xmpp:[email protected]?join):

This happened when sending messages after reconnecting after the server restarted (we were connected before server restarted)

1dd5c65e-dbf6-4b9f-8fa4-e03348208d2f_b4009d82e4a4ee71322ab6661810d6b04e43255d

The person I spoke with also experienced the same issue as well after sending a message, so probably not to do with multiple clients but just the server restarting.

In case this information is helpful, the server the account I am connecting is from is conversations.im

UntrimmedParasail avatar Oct 21 '24 12:10 UntrimmedParasail