ejabberd
ejabberd copied to clipboard
Lost messages for no apparent reason (regression)
Environment
- ejabberd version: 21.04
- Erlang version: Erlang (SMP,ASYNC_THREADS) (BEAM) emulator version 12.2
- OS: Linux (Gentoo)
- Installed from: ebuild/source
Errors from error.log/crash.log
No errors
Bug description
After upgrading from 20.04 to 21.04, randomly messages get lost. When it happens, it continues to lose messages until at least the client reconnects (but not right away upon reconnection!). There is no indication to the sender, that the recipient didn't get the messages.
Config? Config differences old vs new?
20.04 config:
###
### ejabberd configuration file
###
### The parameters used in this configuration file are explained at
###
### https://docs.ejabberd.im/admin/configuration
###
### The configuration file is written in YAML.
### *******************************************************
### ******* !!! WARNING !!! *******
### ******* YAML IS INDENTATION SENSITIVE *******
### ******* MAKE SURE YOU INDENT SECTIONS CORRECTLY *******
### *******************************************************
### Refer to http://en.wikipedia.org/wiki/YAML for the brief description.
###
hosts:
# - "localhost"
- "dashjr.org"
- "anonymous.dashjr.org"
- "friends.dashjr.org"
loglevel: info
log_rotate_size: 10485760
log_rotate_count: 5
## If you already have certificates, list them here
certfiles:
- /etc/ssl/ejabberd/server.pem
listen:
-
port: 5222
ip: "::"
module: ejabberd_c2s
max_stanza_size: 262144
shaper: c2s_shaper
access: c2s
starttls_required: true
-
port: 5269
ip: "::"
module: ejabberd_s2s_in
max_stanza_size: 524288
-
port: 5443
ip: "::"
module: ejabberd_http
tls: true
request_handlers:
/admin: ejabberd_web_admin
/api: mod_http_api
/bosh: mod_bosh
/captcha: ejabberd_captcha
/upload: mod_http_upload
/ws: ejabberd_http_ws
-
port: 5280
ip: "::"
module: ejabberd_http
request_handlers:
/admin: ejabberd_web_admin
/.well-known/acme-challenge: ejabberd_acme
-
port: 3478
transport: udp
module: ejabberd_stun
use_turn: true
## The server's public IPv4 address:
# turn_ip: 203.0.113.3
-
port: 1883
ip: "::"
module: mod_mqtt
backlog: 1000
s2s_use_starttls: optional
acl:
admin:
user: [email protected]
local:
user_regexp: ""
loopback:
ip:
- 127.0.0.0/8
- ::1/128
access_rules:
local:
allow: local
c2s:
deny: blocked
allow: all
announce:
allow: admin
configure:
allow: admin
muc_create:
allow: local
pubsub_createnode:
allow: local
trusted_network:
allow: loopback
api_permissions:
"console commands":
from:
- ejabberd_ctl
who: all
what: "*"
"admin access":
who:
access:
allow:
acl: loopback
acl: admin
oauth:
scope: "ejabberd:admin"
access:
allow:
acl: loopback
acl: admin
what:
- "*"
- "!stop"
- "!start"
"public commands":
who:
ip: 127.0.0.1/8
what:
- status
- connected_users_number
shaper:
normal: 1000
fast: 50000
shaper_rules:
max_user_sessions: 10
max_user_offline_messages:
5000: admin
100: all
c2s_shaper:
none: admin
normal: all
s2s_shaper: fast
modules:
mod_adhoc: {}
mod_admin_extra: {}
mod_announce:
access: announce
mod_avatar: {}
mod_blocking: {}
mod_bosh: {}
mod_caps: {}
mod_carboncopy: {}
mod_client_state: {}
mod_configure: {}
mod_disco: {}
mod_fail2ban: {}
mod_http_api: {}
mod_http_upload:
put_url: https://@HOST@:5443/upload
mod_last: {}
mod_mam:
## Mnesia is limited to 2GB, better to use an SQL backend
## For small servers SQLite is a good fit and is very easy
## to configure. Uncomment this when you have SQL configured:
## db_type: sql
assume_mam_usage: true
default: always
mod_mqtt: {}
mod_muc:
access:
- allow
access_admin:
- allow: admin
access_create: muc_create
access_persistent: muc_create
access_mam:
- allow
default_room_options:
mam: true
mod_muc_admin: {}
mod_offline:
access_max_user_messages: max_user_offline_messages
mod_ping: {}
mod_privacy: {}
mod_private: {}
mod_proxy65:
access: local
max_connections: 5
mod_pubsub:
access_createnode: pubsub_createnode
plugins:
- flat
- pep
force_node_config:
## Avoid buggy clients to make their bookmarks public
storage:bookmarks:
access_model: whitelist
mod_push: {}
mod_push_keepalive: {}
mod_register:
## Only accept registration requests from the "trusted"
## network (see access_rules section above).
## Think twice before enabling registration from any
## address. See the Jabber SPAM Manifesto for details:
## https://github.com/ge0rg/jabber-spam-fighting-manifesto
ip_access: trusted_network
welcome_message:
subject: "Welcome!"
body: |-
Welcome to the Dashjr IM server.
registration_watchers:
- "[email protected]"
# don't allow deleting own account
access_remove: none
mod_roster:
versioning: true
mod_s2s_dialback: {}
mod_shared_roster: {}
mod_stream_mgmt:
resend_on_timeout: if_offline
mod_stun_disco: {}
mod_vcard: {}
mod_vcard_xupdate: {}
mod_version:
show_os: false
mod_muc_log:
file_permissions:
mode: 600
outdir: /var/log/jabber/muc
timezone: universal
auth_method: internal
host_config:
anonymous.dashjr.org:
auth_method: [anonymous]
anonymous_protocol: sasl_anon
### Local Variables:
### mode: yaml
### End:
### vim: set filetype=yaml tabstop=8
21.04 config:
###
### ejabberd configuration file
###
### The parameters used in this configuration file are explained at
###
### https://docs.ejabberd.im/admin/configuration
###
### The configuration file is written in YAML.
### *******************************************************
### ******* !!! WARNING !!! *******
### ******* YAML IS INDENTATION SENSITIVE *******
### ******* MAKE SURE YOU INDENT SECTIONS CORRECTLY *******
### *******************************************************
### Refer to http://en.wikipedia.org/wiki/YAML for the brief description.
###
hosts:
# - "localhost"
- "dashjr.org"
- "anonymous.dashjr.org"
- "friends.dashjr.org"
loglevel: info
## If you already have certificates, list them here
certfiles:
- /etc/ssl/ejabberd/server.pem
listen:
-
port: 5222
ip: "::"
module: ejabberd_c2s
max_stanza_size: 262144
shaper: c2s_shaper
access: c2s
starttls_required: true
-
port: 5223
ip: "::"
tls: true
module: ejabberd_c2s
max_stanza_size: 262144
shaper: c2s_shaper
access: c2s
starttls_required: true
-
port: 5269
ip: "::"
module: ejabberd_s2s_in
max_stanza_size: 524288
-
port: 5443
ip: "::"
module: ejabberd_http
tls: true
request_handlers:
/admin: ejabberd_web_admin
/api: mod_http_api
/bosh: mod_bosh
/captcha: ejabberd_captcha
/upload: mod_http_upload
/ws: ejabberd_http_ws
-
port: 5280
ip: "::"
module: ejabberd_http
request_handlers:
/admin: ejabberd_web_admin
/.well-known/acme-challenge: ejabberd_acme
-
port: 3478
ip: "::"
transport: udp
module: ejabberd_stun
use_turn: true
## The server's public IPv4 address:
# turn_ipv4_address: "203.0.113.3"
## The server's public IPv6 address:
# turn_ipv6_address: "2001:db8::3"
-
port: 1883
ip: "::"
module: mod_mqtt
backlog: 1000
s2s_use_starttls: optional
acl:
admin:
user: [email protected]
local:
user_regexp: ""
loopback:
ip:
- 127.0.0.0/8
- ::1/128
access_rules:
local:
allow: local
c2s:
deny: blocked
allow: all
announce:
allow: admin
configure:
allow: admin
muc_create:
allow: local
pubsub_createnode:
allow: local
trusted_network:
allow: loopback
api_permissions:
"console commands":
from:
- ejabberd_ctl
who: all
what: "*"
"admin access":
who:
access:
allow:
- acl: loopback
- acl: admin
oauth:
scope: "ejabberd:admin"
access:
allow:
- acl: loopback
- acl: admin
what:
- "*"
- "!stop"
- "!start"
"public commands":
who:
ip: 127.0.0.1/8
what:
- status
- connected_users_number
shaper:
normal:
rate: 3000
burst_size: 20000
fast: 100000
shaper_rules:
max_user_sessions: 10
max_user_offline_messages:
5000: admin
100: all
c2s_shaper:
none: admin
normal: all
s2s_shaper: fast
modules:
mod_adhoc: {}
mod_admin_extra: {}
mod_announce:
access: announce
mod_avatar: {}
mod_blocking: {}
mod_bosh: {}
mod_caps: {}
mod_carboncopy: {}
mod_client_state: {}
mod_configure: {}
mod_disco: {}
mod_fail2ban: {}
mod_http_api: {}
mod_http_upload:
put_url: https://@HOST@:5443/upload
custom_headers:
"Access-Control-Allow-Origin": "https://@HOST@"
"Access-Control-Allow-Methods": "GET,HEAD,PUT,OPTIONS"
"Access-Control-Allow-Headers": "Content-Type"
mod_last: {}
mod_mam:
## Mnesia is limited to 2GB, better to use an SQL backend
## For small servers SQLite is a good fit and is very easy
## to configure. Uncomment this when you have SQL configured:
## db_type: sql
assume_mam_usage: true
default: always
mod_mqtt: {}
mod_muc:
access:
- allow
access_admin:
- allow: admin
access_create: muc_create
access_persistent: muc_create
access_mam:
- allow
default_room_options:
mam: true
mod_muc_admin: {}
mod_offline:
access_max_user_messages: max_user_offline_messages
mod_ping: {}
mod_privacy: {}
mod_private: {}
mod_proxy65:
access: local
max_connections: 5
mod_pubsub:
access_createnode: pubsub_createnode
plugins:
- flat
- pep
force_node_config:
## Avoid buggy clients to make their bookmarks public
storage:bookmarks:
access_model: whitelist
mod_push: {}
mod_push_keepalive: {}
mod_register:
## Only accept registration requests from the "trusted"
## network (see access_rules section above).
## Think twice before enabling registration from any
## address. See the Jabber SPAM Manifesto for details:
## https://github.com/ge0rg/jabber-spam-fighting-manifesto
ip_access: trusted_network
welcome_message:
subject: "Welcome!"
body: |-
Welcome to the Dashjr IM server.
registration_watchers:
- "[email protected]"
# don't allow deleting own account
access_remove: none
mod_roster:
versioning: true
mod_s2s_dialback: {}
mod_shared_roster: {}
mod_stream_mgmt:
resend_on_timeout: if_offline
mod_stun_disco: {}
mod_vcard: {}
mod_vcard_xupdate: {}
mod_version:
show_os: false
mod_muc_log:
file_permissions:
mode: 600
outdir: /var/log/jabber/muc
timezone: universal
auth_method: internal
host_config:
anonymous.dashjr.org:
auth_method: [anonymous]
anonymous_protocol: sasl_anon
### Local Variables:
### mode: yaml
### End:
### vim: set filetype=yaml tabstop=8
How do you know you're missing messages?
Which clients are used on this account?
I physically find my child and he shows me he didn't get any of my recent messages. From Psi+ to Conversations.
Can you try with Dino or Gajim or another Conversations instead of Psi+?
What type of messages? They were online at the time or offline?
Can't risk lost messages this weekend, so it'll have to wait :/
Just plain text "chat" type messages. Both users were online (and are 24/7).
Both users also have multiple connections (Psi+ and Conversations). I didn't notice if the recipient's Psi+ got the messages or not.
What do you use like Psi+ version and what OS? With or without OMEMO?
Sender: Psi+ 1.5.1484 on Gentoo.
(Recipient's other client: Psi 1.3-5build1 from/on Ubuntu 20.04)
I have the Psi+ OMEMO plugin on the sender, but I don't know if it was enabled or not.
@luke-jr: Can you update Psi+ to last build? Some OMEMO problems have been solved, maybe it is linked.
Confirmed that the recipient's Psi+ did receive the messages that prompted this initial report.
Also been having issues with 20.04, with another child having two Psi+s yet only receiving messages at one or the other...
Can ejabberd add an option to simply forward all messages to all resources, disregarding if they're directed to a specific one or what the priorities of each connection are? :/
Can ejabberd add an option to simply forward all messages to all resources, disregarding if they're directed to a specific one or what the priorities of each connection are? :/
Those behaviours go against the RFC...
Anyway, in your specific case, maybe a dirty patch is easier than explaining your users, or tweaking client configuration. This small patch customizes it for your specific case. Notice it's a proof of concept, it may break other parts, for instance, MUC rooms.
diff --git a/src/ejabberd_sm.erl b/src/ejabberd_sm.erl
index 231e4351e..e6086019b 100644
--- a/src/ejabberd_sm.erl
+++ b/src/ejabberd_sm.erl
@@ -699,7 +699,7 @@ do_route(#presence{to = #jid{lresource = <<"">>} = To} = Packet) ->
fun({_, R}) ->
do_route(Packet#presence{to = jid:replace_resource(To, R)})
end, get_user_present_resources(LUser, LServer));
-do_route(#message{to = #jid{lresource = <<"">>} = To, type = T} = Packet) ->
+do_route(#message{to = To, type = T} = Packet) ->
?DEBUG("Processing message to bare JID:~n~ts", [xmpp:pp(Packet)]),
if T == chat; T == headline; T == normal ->
route_message(Packet);
@@ -762,7 +762,7 @@ route_message(#message{to = To, type = Type} = Packet) ->
case catch lists:max(PrioRes) of
{MaxPrio, MaxRes}
when is_integer(MaxPrio), MaxPrio >= 0 ->
- lists:foreach(fun ({P, R}) when P == MaxPrio;
+ lists:foreach(fun ({P, R}) when true;
(P >= 0) and (Type == headline) ->
LResource = jid:resourceprep(R),
Mod = get_sm_backend(LServer),
Can you update it to Psi+ 1.5.1605 or more?
Better to use Psi+ than Psi, the last is 1.5.
Latest for Ubuntu focal (which that system runs) is Psi 1.3-5build1 or Psi+ 1.4.554-4 (which it is now running)
@tehnick, @Ri0n: Can you reply here about old Psi and Psi+?
https://launchpad.net/~psi-plus/+archive/ubuntu/ppa
It's also interesting where stream management is enabled in account settings in Psi. Unfortunately not all the XEPs related to reliability are implemented in Psi. So lost messages are possible with bad connection.
Anyway, in your specific case, maybe a dirty patch is easier than explaining your users, or tweaking client configuration. This small patch customizes it for your specific case. Notice it's a proof of concept, it may break other parts, for instance, MUC rooms.
Got around to trying this. The patch causes Psi+ to see duplicates of everything sent. >_<
Duplicates appear to be carbons. Any easy way to not affect internally-generated stuff?
Maybe a better solution would be to force all c2s connections to the same priority, and strip /resource specifiers on c2s packets?
Maybe a better solution would be to force all c2s connections to the same priority, and strip /resource specifiers on c2s packets?
The core XMPP RFCs specify how to address individual devices vs. accounts. This is used for various use cases. Breaking such essential rules in order to work around client issues is not a sensible solution, no.
Trying another client is not an option? If only to verify there's no actual server-side issue (in which case we could close this issue)?
The core XMPP RFCs specify how to address individual devices vs. accounts. This is used for various use cases. Breaking such essential rules in order to work around client issues is not a sensible solution, no.
That's why I'm suggesting doing it at a border layer, rather than deep in the internals.
Do you have a better solution?
Trying another client is not an option? If only to verify there's no actual server-side issue
If it was easily or at least predictably reproduced, perhaps, but random occurrences don't really make it a viable option.
Besides, there aren't really any better desktop/Qt clients AFAIK?
Do you have a better solution?
I think it's better to fix client issues on the client side.
there aren't really any better desktop/Qt clients AFAIK?
I'd try Gajim or (on Linux) Dino, for example. (Not Qt, but I'd hope the UI toolkit in use isn't relevant here?) If all else fails, Converse.js might be another option (you could use a public instance with a test account for tracking down this issue).
Hey guys. I didn't quite follow all the conversation. But are you talking about not always working carbons in Psi? IIRC carbons have some issues in Psi when used together with OMEMO. Probably I won't be able to fix it in Psi because of lack of spare time for the project. But I'll gladly accept patches.
I think it's better to fix client issues on the client side.
Since this is a regression when the server was upgraded, there's no reason to think it's a client issue.
there's no reason to think it's a client issue.
In that case it will be easy to reproduce with other clients, right? Could you do that to double-check?
Another thing:
I have the Psi+ OMEMO plugin on the sender, but I don't know if it was enabled or not.
Please always double-check such issues can be reproduced with OMEMO disabled on all involved clients.
It's not easy to reproduce even with the same client. I have no idea how to reliably reproduce it. It's an apparently-random occurrence which can happen at the most inopportune moments when I actually need the recipient to get the message immediately.
That being said, a few days ago I did upgrade to 22.05, so over the next few months will discover if it's still an issue or not.
Please always double-check such issues can be reproduced with OMEMO disabled on all involved clients.
I don't think it's possible to disable OMEMO in Conversations. cc @iNPUTmice
I'll close this issue for the moment then. If you run into it again and manage to reproduce the problem with OMEMO disabled, feel free to open a new one with any additional info you were able to gather.
@luke-jr settings - omemo - default on, then per chat set it on or off
@luke-jr: Have you tested with two Psi+ clients without OMEMO plugin?