ejabberd
ejabberd copied to clipboard
occupant-id not stable in some cases
Environment
- ejabberd version: 24.02
Bug description
I have 2 different users in Gajim that reported a problem with occupant-id in 2 different MUCs on different servers.
When we investigated we found that the occupant-id in the presence is different from the occupant-id the server adds to their messages.
<message xmlns="jabber:client" to="[email protected]" type="groupchat" id="8dcee6be-1769-4d7d-acbf-1e545b7a950f">
<body>test</body>
<origin-id xmlns="urn:xmpp:sid:0" id="8dcee6be-1769-4d7d-acbf-1e545b7a950f" />
</message>
<message xmlns="jabber:client" xml:lang="en" to="[email protected]/gajim.J39W82BR" from="[email protected]/myname" type="groupchat" id="8dcee6be-1769-4d7d-acbf-1e545b7a950f">
<archived xmlns="urn:xmpp:mam:tmp" by="[email protected]" id="1719035916432908" />
<stanza-id xmlns="urn:xmpp:sid:0" by="[email protected]" id="1719035916432908" />
<origin-id xmlns="urn:xmpp:sid:0" id="8dcee6be-1769-4d7d-acbf-1e545b7a950f" />
<occupant-id xmlns="urn:xmpp:occupant-id:0" id="g20AAAAgAozSRwPfZaleTLJ0hNM73x5BBMGnB2HfuA12Mm9MpNU=" />
<body>test</body>
</message>
<presence xmlns="jabber:client" xml:lang="en" to="[email protected]/gajim.J39W82BR" from="[email protected]/myname" id="2aadd6b4-8c2a-4471-9b4f-3a098416a765">
...
<occupant-id xmlns="urn:xmpp:occupant-id:0" id="g20AAAAgSp/6G+pFGi1tumup3r8yinT6Kk4yPZzoPGdFIzLrteI=" />
<x xmlns="http://jabber.org/protocol/muc#user">
<item jid="[email protected]/gajim.J39W82BR" role="participant" affiliation="none" />
<status code="110" />
</x>
<show>xa</show>
</presence>
Multiple device where joined in the MUC, if that could be a issue. We have no reproducible case for now, but it happens.
Could you please check the code, under which conditions this could happen?
I played trying to reproduce the problem, but did not get it.
Does the problem appear from time to time in your server, even if you are not able to reproduce it voluntarily on demand?
In that case, you could apply a small patch to the modue that generates an ID that we can later decrypt, to determine what is the difference between the two ones
diff --git a/src/mod_muc_occupantid.erl b/src/mod_muc_occupantid.erl
index fabd69e28..42449e563 100644
--- a/src/mod_muc_occupantid.erl
+++ b/src/mod_muc_occupantid.erl
@@ -74,8 +74,8 @@ add_occupantid_packet(Packet, RoomJid) ->
xmpp:set_subtag(Packet, OccupantElement).
calculate_occupantid(From, RoomJid) ->
- Term = {jid:remove_resource(From), get_salt(RoomJid)},
- misc:term_to_base64(crypto:hash(sha256, io_lib:format("~p", [Term]))).
+ Term = {jid:remove_resource(From), RoomJid, get_salt(RoomJid)},
+ misc:term_to_base64(io_lib:format("~p", [Term])).
%%%
%%% Table storing rooms' salt
Then, the ID is a very long string like:
<occupant-id id='g2wAAAABbAAAAANhe2wAAAAJbAAAAANhe2wAAAANawADamlkYSxsAAAABWE8YTxrAAcidXNlcjIiYT5hPmphLGwAAAAFYTxhPGsACyJsb2NhbGhvc3QiYT5hPmphLGsABDw8Pj5hLGwAAAAFYTxhPGsAByJ1c2VyMiJhPmE+amEsbAAAAAVhPGE8awALImxvY2FsaG9zdCJhPmE+amEsawAEPDw+PmphfWphLGEKawABIGwAAAADYXtsAAAAD2sAA2ppZGEsbAAAAAVhPGE8awAHInNhbGExImE+YT5qYSxsAAAABWE8YTxrABYiY29uZmVyZW5jZS5sb2NhbGhvc3QiYT5hPmphLGsABDw8Pj5hLGwAAAAFYTxhPGsAByJzYWxhMSJhPmE+amEsYQpsAAAAAmwAAAAEYSBrAAIgIGEgYSBqYSBqbAAAAAVhPGE8awAWImNvbmZlcmVuY2UubG9jYWxob3N0ImE+YT5qYSxrAAQ8PD4+amF9amEsYQprAAEgbAAAAAVhPGE8awAWIjE1OTU4NTQzNDU1MjgyNzcwNjEyImE+YT5qamF9amo='
xmlns='urn:xmpp:occupant-id:0'/>
That can be easily decrypted to know what user, what room and what room salt were used:
misc:base64_to_term("g2wAAAABbAAAAANhe2wAAAAJbAAAAANhe2wAAAANawADamlkYSxsAAAABWE8YTxrAAcidXNlcjIiYT5hPmphLGwAAAAFYTxhPGsACyJsb2NhbGhvc3QiYT5hPmphLGsABDw8Pj5hLGwAAAAFYTxhPGsAByJ1c2VyMiJhPmE+amEsbAAAAAVhPGE8awALImxvY2FsaG9zdCJhPmE+amEsawAEPDw+PmphfWphLGEKawABIGwAAAADYXtsAAAAD2sAA2ppZGEsbAAAAAVhPGE8awAHInNhbGExImE+YT5qYSxsAAAABWE8YTxrABYiY29uZmVyZW5jZS5sb2NhbGhvc3QiYT5hPmphLGsABDw8Pj5hLGwAAAAFYTxhPGsAByJzYWxhMSJhPmE+amEsYQpsAAAAAmwAAAAEYSBrAAIgIGEgYSBqYSBqbAAAAAVhPGE8awAWImNvbmZlcmVuY2UubG9jYWxob3N0ImE+YT5qYSxrAAQ8PD4+amF9amEsYQprAAEgbAAAAAVhPGE8awAWIjE1OTU4NTQzNDU1MjgyNzcwNjEyImE+YT5qamF9amo=").
{term,[[123,
[[123,
["jid",44,
[60,60,"\"user2\"",62,62],
44,
[60,60,"\"localhost\"",62,62],
44,"<<>>",44,
[60,60,"\"user2\"",62,62],
44,
[60,60,"\"localhost\"",62,62],
44,"<<>>"],
125],
44,10," ",
[123,
["jid",44,
[60,60,"\"sala1\"",62,62],
44,
[60,60,"\"conference.localhost\"",62,62],
44,"<<>>",44,
[60,60,"\"sala1\"",62,62],
44,10,
[[32," ",32,32],32],
[60,60,"\"conference.localhost\"",62|...],
44,"<<>>"],
125],
44,10," ",
[60,60,"\"15958543455282770612\"",62,62]],
125]]}
If you compile ejabberd from source code, simply apply the patch, recompile, install the file and restart the module or restart the whole ejabberd.
If you cannot compile ejabberd yourself, ping me and I'll generate an updated module *.beam file so you can install it.
It could be that room was recreated between those two stanzas, that would change salt (which i think is expected). Only thing that's worth checking is if room hibernation doesn't trigger salt regeneration - it that case we don't want to do that.
Also Badlop was there any reason why you didn't want to put salt in muc_room state but used external mnesia table?
Also Badlop was there any reason why you didn't want to put salt in muc_room state but used external mnesia table?
Yes, the reasoning was something like this:
- The occupant id must be stable (the same id for the same bare user + room) https://xmpp.org/extensions/xep-0421.html#id-generation
- room salt is a random string (cannot be reproduced, so it must be stored persistently as long as the room exists)
- in mod_muc, the room state is lost when mod_muc stops, so we must use a persistent storage
- the first design was to store the room salt in the mod_muc_room
#config record. But this salt is used only by mod_muc_occupantid... keeping the data in a specific table managed by mod_muc_occupantid seems reasonable.
It could be that room was recreated between those two stanzas, that would change salt (which i think is expected).
Right, that could be the case here, and it makes sense that the occupantid changes, as the room is different (may be from a different creator, owner, ... both rooms just have the same roomname).
If that's what happened in this case, it will be revealed when there are new logs that have occupantid we can unencrypt to determine their source information.
Only thing that's worth checking is if room hibernation doesn't trigger salt regeneration - it that case we don't want to do that.
Aha, that could be the case here too, and in that case the module should be improved.
Looking at the code, a room salt is only removed when the room is removed (and the remove_room hook is called):
- close room if temporary and empty
- change config: remove persistence option when mnesia db
- destroy room
- restoring hibernated non-persistent room when mnesia db
Maybe the room is non-persistent, using mnesia storage, got hibernated and the module deleted its salt when got unhibernated. I'll check and update this comment.
Another thing related to hibernation even if hibernation doesn't reset salt (and i think we call remove_room hook only when room is truly removed, and not just hibernated), could be restoring room on different node, i don't think that mnesia table is synced between nodes, so if room was restored on different node, which didn't have it, it will make new one.
But i think it most probably room that is not set to persistent, and it just get deleted when everyone leaves and then get recreated.
A temporary (that is, not-persistent) room is deleted when the last occupant leaves and it has no subscribers.
I was able to reproduce the problem:
- A temporary room looses the last occupant but still has some subscriber: it doesn't get destroyed.
- After an inactivity as long as hibernation_time, it gets hibernated (erlang process gets destroyed, removed from muc_online_room, but room config is kept in muc_room).
- Later when a user joins the room, the room gets restored from hibernation, an erlang process is started, but there is code in mod_muc that calls forget_room, which means that the room salt will be newly created.
I wonder what was the requirement to forget_room when restoring a temporary room.
Removing it solves this particular problem:
diff --git a/src/mod_muc.erl b/src/mod_muc.erl
index e297d866f..3184fe90a 100644
--- a/src/mod_muc.erl
+++ b/src/mod_muc.erl
@@ -903,14 +903,7 @@ load_room(RMod, Host, ServerHost, Room, ResetHibernationTime) ->
Res2;
_ ->
?DEBUG("Restore hibernated non-persistent room: ~ts", [Room]),
- Res = start_room(RMod, Host, ServerHost, Room, Opts0),
- case erlang:function_exported(Mod, get_subscribed_rooms, 3) of
- true ->
- ok;
- _ ->
- forget_room(ServerHost, Host, Room)
- end,
- Res
+ start_room(RMod, Host, ServerHost, Room, Opts0)
end
end.
Hi,
i think its a fair assumption that a salt is regenerated after a room was destroyed.
the room in question here is a group chat with >60 people, and its configured as persistent (at least today), i think it unlikely that it was destroyed.
A bit more info, we could see the problem happening consistently, as in messages always had a different id than the presence. It was not a case of send a message on one day, and check the presence on the next day.
so both occupant-ids where present at the same time, not a case of occupant-id changed because of an event and from that moment on it was different everywhere.
I try to get more logs from the user, maybe we can see what the occupant-id was over a longer period of time.
@badlop Ah, missed that forget room call. That call was there to delete entry from db, as we will not delete this otherwise, we probably should add function that just delete room from db, and have other that remove from db and calls that hook.
@lovetox Do you know if server where this is deployed use clustering or is that just single computer?
its 404.city, we try to reach the admin, but until now no success
EDIT: Sorry got this wrong the first time.
Seems like @prefiks found the reason
we are storing those in mnesia table, not sure if it's set to be stored on disk yes, table that is used by it only keeps that in memory yeah, i will see if we can change that
This still happening, could it be a problem with mutli devices joined in the same MUC from the same user?
the room in question here is a group chat with >60 people, and its configured as persistent (at least today), i think it unlikely that it was destroyed.
I've had this issue occur occasionally for a while and I do have two devices I connect with. But I just met someone else who has the same issue and apparently they only have one device that is connecting to the account they have the issue with.
For reference, this is what it looks like on client side:
Looks like we can reproduce as well (this was in the DivestOS Mobile chat - xmpp:[email protected]?join):
This happened when sending messages after reconnecting after the server restarted (we were connected before server restarted)
The person I spoke with also experienced the same issue as well after sending a message, so probably not to do with multiple clients but just the server restarting.
In case this information is helpful, the server the account I am connecting is from is conversations.im