ejabberd icon indicating copy to clipboard operation
ejabberd copied to clipboard

Matrix Gateway: ejabberd sends "join" while already joined

Open erebion opened this issue 7 months ago • 22 comments

Before creating a ticket, please consider if this should fit the discussion forum better.

Environment

  • ejabberd version: 25.04
  • Erlang version: Erlang (SMP,ASYNC_THREADS) (BEAM) emulator version 15.2.6
  • OS: Linux (Debian)
  • Installed from: official deb

Configuration (only if needed): grep -Ev '^$|^\s*#' ejabberd.yml

loglevel: 4
...

Not sure this relates to any part of the config

Errors from error.log/crash.log

No errors

Bug description

Please, give us a precise description (what does not work, what is expected, etc.)

Ejabberd occasionally sends a join to rooms which are already joined:

{
  "content": {
    "body": "",
    "msgtype": "m.text",
    "net.process-one.xmpp-id": "02b22094-24b5-4f36-a9e7-a3e5ba8a4431"
  },
  "event_id": "$Ev9YPSJ4HVnj08uhJs5RN3PsF3PV4-BLmljY-IAEK50",
  "origin_server_ts": 1747674408209,
  "sender": "@erebion:ejabberd.erebion.eu",
  "type": "m.room.message",
  "unsigned": {
    "membership": "join"
  }
}

Matrix clients such as Element (at least on Android, not sure for the other versions) then show something like:

@user:example.com has made no changes

This annoys people in the room and they start to complain.

Therefore this is both an issue on a protocol level as well as a social issue.

I'm not sure whether Element should even show has made no changes when no change was made or whether the issue is with ejabberd.

Please let me know if I should instead open an issue with Element, I'm not sure what the protocol allows. 🤔

erebion avatar May 19 '25 18:05 erebion

This might also be a bug in Element Android: https://github.com/element-hq/element-android/issues/1430

But even though Element Android is wrong in showing those events, as well as being inconsistent with Element Web/Desktop, which do not show them by default (unless "Show Hidden Events in Timeline is enabled), ejabberd should probably not send redundant join events.

Maybe it could store details of joined rooms and not send a join event that is not necessary, but I guess ejabberd devs will have better ideas, as I don't know the details of the Matrix protocol in this regard.

erebion avatar May 20 '25 10:05 erebion

As ejabberd doesn't persist Matrix rooms, it sends joins again if a user enters a room after a restart. It's allowed in the protocol to send joins for already joined users.

If ejabberd sends repeated joins not after a restart, then it's probably a bug. Can you please check if it connected to ejabberd restarts?

alexeyshch avatar May 20 '25 11:05 alexeyshch

If ejabberd sends repeated joins not after a restart, then it's probably a bug. Can you please check if it connected to ejabberd restarts?

Closing and re-opening Gajim also leads to leave/join, every time.

EDIT:

For #mobian:matrix.org, but for other rooms it does not occur. Here's a screenshot from the Mobian room (I've had a look at it using Fractal):

Image

I have this setup:

  • ejabberd.erebion.eu (ejabberd) <- Matrix GW at matrix.ejabberd.erebion.eu
  • erebion.eu (ejabberd)
  • erebion.eu (Synapse)

I cannot join any Matrix rooms with an address that has the host part "erebion.eu", for some reason. Not sure whether that is relevant for this issue. All other rooms (with the right room versions 9, 10, 11, of course) can be joined just fine.

EDIT 2: The Mobian room has room version 9, the room for which this does not occur has room version 10.

erebion avatar May 20 '25 12:05 erebion

Leave/join is by design, as from the XMPP side a user is leaving and entering a MUC room when a client restarts.

A short address works only when there is no XMPP server on that host, but you can use a long one in that case, like #roomname%[email protected].

alexeyshch avatar May 20 '25 14:05 alexeyshch

A short address works only when there is no XMPP server on that host, but you can use a long one in that case, like #roomname%[email protected].

That's what I'm doing, but somehow it does not work for erebion.eu (Matrix/Synapse) on the same host as erebion.eu (XMPP/ejabberd).

Should that work? If so, I'll open a separate issue.

erebion avatar May 20 '25 19:05 erebion

Leave/join is by design, as from the XMPP side a user is leaving and entering a MUC room when a client restarts.

Wait, it's by design that I leave a room every time I close my client?

Could I make the server keep me in a room?

Should a leave/join be sent to the Matrix room each time? Cause that is not what Matrix users expect and they quickly wonder why someone joins, leaves, joins, leaves and so on.

erebion avatar May 20 '25 19:05 erebion

If ejabberd sends repeated joins not after a restart, then it's probably a bug. Can you please check if it connected to ejabberd restarts?

I've now also noticed ejabberd sends join events without restarting it or doing anything with it.

464 in one room between 01:24 and 12:19 today.

Sure, this might be spec compliant, but still seems a bit excessive.

EDIT: It seems to have started constantly sending join events every couple of minutes. I've neither joined nor left a room, did not close or open a client and the server has had no change to its config or a restart.

EDIT 2: It's not the same amount per room. One has 464 events, while another has 10, in the same time.

EDIT 3: Larger rooms (more users) seem to have many more events than smaller rooms.

EDIT 4: Scrap that theory from EDIT 3, I was wrong. The room with 464 events if far smaller than that with 23 events.

EDIT 5: I've now removed the Matrix GW account from all rooms until I know more, as Element Android happens to show all those events due to a bug.

erebion avatar May 21 '25 10:05 erebion

Wait, it's by design that I leave a room every time I close my client?

Could I make the server keep me in a room?

Should a leave/join be sent to the Matrix room each time? Cause that is not what Matrix users expect and they quickly wonder why someone joins, leaves, joins, leaves and so on.

Unfortunately, MUC rooms and Matrix rooms have different behaviours, so that's a compromise. If the server ignores unavailable presences, then users will be stuck forever in rooms without a way to leave them. Maybe they can be given a timeout, like 24h of inactivity before a leave is sent?

As for repeated joins, can you please make a dump of XML packets that the client sends when it happens? And check logs for errors.

alexeyshch avatar May 21 '25 11:05 alexeyshch

Maybe they can be given a timeout, like 24h of inactivity before a leave is sent?

It happens quite often that I I close the client for the weekend.

How about one of those component forms? Like the ones that allow seeing uptime, adding users, removing users, sending a ping and so on. I don't know the right term. This could allow users to leave a room and/or to configure a timeout per room.

erebion avatar May 21 '25 13:05 erebion

As for repeated joins, can you please make a dump of XML packets that the client sends when it happens? And check logs for errors.

How can I get such a dump? Do I use a specific tool or can I get that from ejabberd itself?

erebion avatar May 21 '25 17:05 erebion

There seems to be some uncertainty here so I'll attempt to clarify somewhat.

Once a Matrix user joins a room they stay in it until they explicitly leave (or close their homeserver account), even if their client and/or homeserver are offline indefinitely. Matrix users expect not to see any IRC-style join/leave spam. Abandoned accounts simply accumulate unless they're kicked (or the room is upgraded, which replaces the room with a linked new one).

Membership events are state events which means they're expensive (relative to chat messages) for all homeservers in the room (matrix federation isn't really a chat protocol, it's a "full mesh" multi-master room state resolution protocol). Spamming unnecessary state events worsens the scale problems this post https://www.process-one.net/blog/matrix-protocol-added-to-ejabberd/ complains about ;)

I became aware of this issue because as a room moderator I was pinged about it affecting our room.

bones-was-here avatar Jul 02 '25 17:07 bones-was-here

Maybe they can be given a timeout, like 24h of inactivity before a leave is sent?

I've thought about this a bit more, I think adhoc commands might be good for that, optionally, as well as timeouts, also optionally. This gives flexibility.

There might be use-cases where people want an account in a room, but might not open a client for months, so a timeout would not work well. Just as adhoc commands might not work in every scenario.

I wonder how Biboumi does that, they surely have a similar issue and need to know when to no longer tell IRC when the user is online on the server.

erebion avatar Jul 13 '25 21:07 erebion

Maybe they can be given a timeout, like 24h of inactivity before a leave is sent?

Another thought: I'd prefer having an explicit leave. This way I can be in a room, have ejabberd collect the message log and later join, using MAM to read up on what happened recently. If the Matrix chat has too much going on, it might be nice to leave it on your own side (ejabberd), without leaving it completely on the other end (Matrix).

The more I think about it, the less a timeout seems to be a good option, except for maybe some special use-cases.

erebion avatar Jul 15 '25 01:07 erebion

I've added leave_timeout option, for now it's better than nothing.

alexeyshch avatar Aug 15 '25 02:08 alexeyshch

I've added leave_timeout option, for now it's better than nothing.

Can that be set to indefinitely? Would work mostly fine for my specific use-case. :)

erebion avatar Aug 15 '25 08:08 erebion

Set it to something like 4000000000, it's over 100 years, should be enough for practical purposes :)

alexeyshch avatar Aug 18 '25 10:08 alexeyshch

I've tried current master at 644d468b4ffc17d95c89e0262923953f12b5e15d and those messages still show up.

I've configured a timeout of ~ a month, that should not show up as that commit hash is not even a month old. :D

erebion avatar Aug 21 '25 16:08 erebion

I'm currently on 3a36a722c50fd99d3d56ad44b995b32d8d424061, got another complaint in #mainline:postmarketos.org.

Does not seem to be fixed yet.

erebion avatar Sep 17 '25 14:09 erebion

got another complaint in #mainline:postmarketos.org.

The user uses Cinny, I've opened another issue there: https://github.com/cinnyapp/cinny/issues/2488

erebion avatar Sep 17 '25 14:09 erebion

Set it to something like 4000000000, it's over 100 years, should be enough for practical purposes :)

Apparently ejabberd is still sending a lot of unnecessary join events. I just had someone complain.

erebion avatar Nov 09 '25 02:11 erebion

Possibly related to ejabberd restart that cause the gateway to lose state?

poVoq avatar Nov 09 '25 19:11 poVoq

Possibly related to ejabberd restart that cause the gateway to lose state?

ejabberd hasn't been restarted in a while.

erebion avatar Nov 10 '25 11:11 erebion