ejabberd icon indicating copy to clipboard operation
ejabberd copied to clipboard

Lost messages for no apparent reason (regression)

Open luke-jr opened this issue 2 years ago • 15 comments

Environment

  • ejabberd version: 21.04
  • Erlang version: Erlang (SMP,ASYNC_THREADS) (BEAM) emulator version 12.2
  • OS: Linux (Gentoo)
  • Installed from: ebuild/source

Errors from error.log/crash.log

No errors

Bug description

After upgrading from 20.04 to 21.04, randomly messages get lost. When it happens, it continues to lose messages until at least the client reconnects (but not right away upon reconnection!). There is no indication to the sender, that the recipient didn't get the messages.

luke-jr avatar Mar 09 '22 21:03 luke-jr

Config? Config differences old vs new?

licaon-kter avatar Mar 10 '22 04:03 licaon-kter

20.04 config:

###
###              ejabberd configuration file
###
### The parameters used in this configuration file are explained at
###
###       https://docs.ejabberd.im/admin/configuration
###
### The configuration file is written in YAML.
### *******************************************************
### *******           !!! WARNING !!!               *******
### *******     YAML IS INDENTATION SENSITIVE       *******
### ******* MAKE SURE YOU INDENT SECTIONS CORRECTLY *******
### *******************************************************
### Refer to http://en.wikipedia.org/wiki/YAML for the brief description.
###
hosts:
#  - "localhost"
  - "dashjr.org"
  - "anonymous.dashjr.org"
  - "friends.dashjr.org"

loglevel: info
log_rotate_size: 10485760
log_rotate_count: 5

## If you already have certificates, list them here
certfiles:
  - /etc/ssl/ejabberd/server.pem

listen:
  -
    port: 5222
    ip: "::"
    module: ejabberd_c2s
    max_stanza_size: 262144
    shaper: c2s_shaper
    access: c2s
    starttls_required: true
  -
    port: 5269
    ip: "::"
    module: ejabberd_s2s_in
    max_stanza_size: 524288
  -
    port: 5443
    ip: "::"
    module: ejabberd_http
    tls: true
    request_handlers:
      /admin: ejabberd_web_admin
      /api: mod_http_api
      /bosh: mod_bosh
      /captcha: ejabberd_captcha
      /upload: mod_http_upload
      /ws: ejabberd_http_ws
  -
    port: 5280
    ip: "::"
    module: ejabberd_http
    request_handlers:
      /admin: ejabberd_web_admin
      /.well-known/acme-challenge: ejabberd_acme
  -
    port: 3478
    transport: udp
    module: ejabberd_stun
    use_turn: true
    ## The server's public IPv4 address:
    # turn_ip: 203.0.113.3
  -
    port: 1883
    ip: "::"
    module: mod_mqtt
    backlog: 1000

s2s_use_starttls: optional

acl:
  admin:
    user: [email protected]
  local:
    user_regexp: ""
  loopback:
    ip:
      - 127.0.0.0/8
      - ::1/128

access_rules:
  local:
    allow: local
  c2s:
    deny: blocked
    allow: all
  announce:
    allow: admin
  configure:
    allow: admin
  muc_create:
    allow: local
  pubsub_createnode:
    allow: local
  trusted_network:
    allow: loopback

api_permissions:
  "console commands":
    from:
      - ejabberd_ctl
    who: all
    what: "*"
  "admin access":
    who:
      access:
        allow:
          acl: loopback
          acl: admin
      oauth:
        scope: "ejabberd:admin"
        access:
          allow:
            acl: loopback
            acl: admin
    what:
      - "*"
      - "!stop"
      - "!start"
  "public commands":
    who:
      ip: 127.0.0.1/8
    what:
      - status
      - connected_users_number

shaper:
  normal: 1000
  fast: 50000

shaper_rules:
  max_user_sessions: 10
  max_user_offline_messages:
    5000: admin
    100: all
  c2s_shaper:
    none: admin
    normal: all
  s2s_shaper: fast

modules:
  mod_adhoc: {}
  mod_admin_extra: {}
  mod_announce:
    access: announce
  mod_avatar: {}
  mod_blocking: {}
  mod_bosh: {}
  mod_caps: {}
  mod_carboncopy: {}
  mod_client_state: {}
  mod_configure: {}
  mod_disco: {}
  mod_fail2ban: {}
  mod_http_api: {}
  mod_http_upload:
    put_url: https://@HOST@:5443/upload
  mod_last: {}
  mod_mam:
    ## Mnesia is limited to 2GB, better to use an SQL backend
    ## For small servers SQLite is a good fit and is very easy
    ## to configure. Uncomment this when you have SQL configured:
    ## db_type: sql
    assume_mam_usage: true
    default: always
  mod_mqtt: {}
  mod_muc:
    access:
      - allow
    access_admin:
      - allow: admin
    access_create: muc_create
    access_persistent: muc_create
    access_mam:
      - allow
    default_room_options:
      mam: true
  mod_muc_admin: {}
  mod_offline:
    access_max_user_messages: max_user_offline_messages
  mod_ping: {}
  mod_privacy: {}
  mod_private: {}
  mod_proxy65:
    access: local
    max_connections: 5
  mod_pubsub:
    access_createnode: pubsub_createnode
    plugins:
      - flat
      - pep
    force_node_config:
      ## Avoid buggy clients to make their bookmarks public
      storage:bookmarks:
        access_model: whitelist
  mod_push: {}
  mod_push_keepalive: {}
  mod_register:
    ## Only accept registration requests from the "trusted"
    ## network (see access_rules section above).
    ## Think twice before enabling registration from any
    ## address. See the Jabber SPAM Manifesto for details:
    ## https://github.com/ge0rg/jabber-spam-fighting-manifesto
    ip_access: trusted_network
    welcome_message:
      subject: "Welcome!"
      body: |-
        Welcome to the Dashjr IM server.
    registration_watchers:
      - "[email protected]"
    # don't allow deleting own account
    access_remove: none
  mod_roster:
    versioning: true
  mod_s2s_dialback: {}
  mod_shared_roster: {}
  mod_stream_mgmt:
    resend_on_timeout: if_offline
  mod_stun_disco: {}
  mod_vcard: {}
  mod_vcard_xupdate: {}
  mod_version:
    show_os: false
  mod_muc_log:
    file_permissions:
      mode: 600
    outdir: /var/log/jabber/muc
    timezone: universal

auth_method: internal

host_config:
  anonymous.dashjr.org:
    auth_method: [anonymous]
    anonymous_protocol: sasl_anon

### Local Variables:
### mode: yaml
### End:
### vim: set filetype=yaml tabstop=8

21.04 config:

###
###              ejabberd configuration file
###
### The parameters used in this configuration file are explained at
###
###       https://docs.ejabberd.im/admin/configuration
###
### The configuration file is written in YAML.
### *******************************************************
### *******           !!! WARNING !!!               *******
### *******     YAML IS INDENTATION SENSITIVE       *******
### ******* MAKE SURE YOU INDENT SECTIONS CORRECTLY *******
### *******************************************************
### Refer to http://en.wikipedia.org/wiki/YAML for the brief description.
###

hosts:
#  - "localhost"
  - "dashjr.org"
  - "anonymous.dashjr.org"
  - "friends.dashjr.org"

loglevel: info

## If you already have certificates, list them here
certfiles:
  - /etc/ssl/ejabberd/server.pem

listen:
  -
    port: 5222
    ip: "::"
    module: ejabberd_c2s
    max_stanza_size: 262144
    shaper: c2s_shaper
    access: c2s
    starttls_required: true
  -
    port: 5223
    ip: "::"
    tls: true
    module: ejabberd_c2s
    max_stanza_size: 262144
    shaper: c2s_shaper
    access: c2s
    starttls_required: true
  -
    port: 5269
    ip: "::"
    module: ejabberd_s2s_in
    max_stanza_size: 524288
  -
    port: 5443
    ip: "::"
    module: ejabberd_http
    tls: true
    request_handlers:
      /admin: ejabberd_web_admin
      /api: mod_http_api
      /bosh: mod_bosh
      /captcha: ejabberd_captcha
      /upload: mod_http_upload
      /ws: ejabberd_http_ws
  -
    port: 5280
    ip: "::"
    module: ejabberd_http
    request_handlers:
      /admin: ejabberd_web_admin
      /.well-known/acme-challenge: ejabberd_acme
  -
    port: 3478
    ip: "::"
    transport: udp
    module: ejabberd_stun
    use_turn: true
    ## The server's public IPv4 address:
    # turn_ipv4_address: "203.0.113.3"
    ## The server's public IPv6 address:
    # turn_ipv6_address: "2001:db8::3"
  -
    port: 1883
    ip: "::"
    module: mod_mqtt
    backlog: 1000

s2s_use_starttls: optional

acl:
  admin:
    user: [email protected]
  local:
    user_regexp: ""
  loopback:
    ip:
      - 127.0.0.0/8
      - ::1/128

access_rules:
  local:
    allow: local
  c2s:
    deny: blocked
    allow: all
  announce:
    allow: admin
  configure:
    allow: admin
  muc_create:
    allow: local
  pubsub_createnode:
    allow: local
  trusted_network:
    allow: loopback

api_permissions:
  "console commands":
    from:
      - ejabberd_ctl
    who: all
    what: "*"
  "admin access":
    who:
      access:
        allow:
          - acl: loopback
          - acl: admin
      oauth:
        scope: "ejabberd:admin"
        access:
          allow:
            - acl: loopback
            - acl: admin
    what:
      - "*"
      - "!stop"
      - "!start"
  "public commands":
    who:
      ip: 127.0.0.1/8
    what:
      - status
      - connected_users_number

shaper:
  normal:
    rate: 3000
    burst_size: 20000
  fast: 100000

shaper_rules:
  max_user_sessions: 10
  max_user_offline_messages:
    5000: admin
    100: all
  c2s_shaper:
    none: admin
    normal: all
  s2s_shaper: fast

modules:
  mod_adhoc: {}
  mod_admin_extra: {}
  mod_announce:
    access: announce
  mod_avatar: {}
  mod_blocking: {}
  mod_bosh: {}
  mod_caps: {}
  mod_carboncopy: {}
  mod_client_state: {}
  mod_configure: {}
  mod_disco: {}
  mod_fail2ban: {}
  mod_http_api: {}
  mod_http_upload:
    put_url: https://@HOST@:5443/upload
    custom_headers:
      "Access-Control-Allow-Origin": "https://@HOST@"
      "Access-Control-Allow-Methods": "GET,HEAD,PUT,OPTIONS"
      "Access-Control-Allow-Headers": "Content-Type"
  mod_last: {}
  mod_mam:
    ## Mnesia is limited to 2GB, better to use an SQL backend
    ## For small servers SQLite is a good fit and is very easy
    ## to configure. Uncomment this when you have SQL configured:
    ## db_type: sql
    assume_mam_usage: true
    default: always
  mod_mqtt: {}
  mod_muc:
    access:
      - allow
    access_admin:
      - allow: admin
    access_create: muc_create
    access_persistent: muc_create
    access_mam:
      - allow
    default_room_options:
      mam: true
  mod_muc_admin: {}
  mod_offline:
    access_max_user_messages: max_user_offline_messages
  mod_ping: {}
  mod_privacy: {}
  mod_private: {}
  mod_proxy65:
    access: local
    max_connections: 5
  mod_pubsub:
    access_createnode: pubsub_createnode
    plugins:
      - flat
      - pep
    force_node_config:
      ## Avoid buggy clients to make their bookmarks public
      storage:bookmarks:
        access_model: whitelist
  mod_push: {}
  mod_push_keepalive: {}
  mod_register:
    ## Only accept registration requests from the "trusted"
    ## network (see access_rules section above).
    ## Think twice before enabling registration from any
    ## address. See the Jabber SPAM Manifesto for details:
    ## https://github.com/ge0rg/jabber-spam-fighting-manifesto
    ip_access: trusted_network
    welcome_message:
      subject: "Welcome!"
      body: |-
        Welcome to the Dashjr IM server.
    registration_watchers:
      - "[email protected]"
    # don't allow deleting own account
    access_remove: none
  mod_roster:
    versioning: true
  mod_s2s_dialback: {}
  mod_shared_roster: {}
  mod_stream_mgmt:
    resend_on_timeout: if_offline
  mod_stun_disco: {}
  mod_vcard: {}
  mod_vcard_xupdate: {}
  mod_version:
    show_os: false
  mod_muc_log:
    file_permissions:
      mode: 600
    outdir: /var/log/jabber/muc
    timezone: universal

auth_method: internal

host_config:
  anonymous.dashjr.org:
    auth_method: [anonymous]
    anonymous_protocol: sasl_anon

### Local Variables:
### mode: yaml
### End:
### vim: set filetype=yaml tabstop=8

luke-jr avatar Mar 10 '22 04:03 luke-jr

How do you know you're missing messages?

Which clients are used on this account?

licaon-kter avatar Mar 10 '22 08:03 licaon-kter

I physically find my child and he shows me he didn't get any of my recent messages. From Psi+ to Conversations.

luke-jr avatar Mar 10 '22 15:03 luke-jr

Can you try with Dino or Gajim or another Conversations instead of Psi+?

What type of messages? They were online at the time or offline?

licaon-kter avatar Mar 10 '22 15:03 licaon-kter

Can't risk lost messages this weekend, so it'll have to wait :/

Just plain text "chat" type messages. Both users were online (and are 24/7).

Both users also have multiple connections (Psi+ and Conversations). I didn't notice if the recipient's Psi+ got the messages or not.

luke-jr avatar Mar 10 '22 20:03 luke-jr

What do you use like Psi+ version and what OS? With or without OMEMO?

Neustradamus avatar Mar 11 '22 02:03 Neustradamus

Sender: Psi+ 1.5.1484 on Gentoo.

(Recipient's other client: Psi 1.3-5build1 from/on Ubuntu 20.04)

I have the Psi+ OMEMO plugin on the sender, but I don't know if it was enabled or not.

luke-jr avatar Mar 11 '22 02:03 luke-jr

@luke-jr: Can you update Psi+ to last build? Some OMEMO problems have been solved, maybe it is linked.

Neustradamus avatar Apr 30 '22 21:04 Neustradamus

Confirmed that the recipient's Psi+ did receive the messages that prompted this initial report.

Also been having issues with 20.04, with another child having two Psi+s yet only receiving messages at one or the other...

Can ejabberd add an option to simply forward all messages to all resources, disregarding if they're directed to a specific one or what the priorities of each connection are? :/

luke-jr avatar May 07 '22 20:05 luke-jr

Can ejabberd add an option to simply forward all messages to all resources, disregarding if they're directed to a specific one or what the priorities of each connection are? :/

Those behaviours go against the RFC...

Anyway, in your specific case, maybe a dirty patch is easier than explaining your users, or tweaking client configuration. This small patch customizes it for your specific case. Notice it's a proof of concept, it may break other parts, for instance, MUC rooms.

diff --git a/src/ejabberd_sm.erl b/src/ejabberd_sm.erl
index 231e4351e..e6086019b 100644
--- a/src/ejabberd_sm.erl
+++ b/src/ejabberd_sm.erl
@@ -699,7 +699,7 @@ do_route(#presence{to = #jid{lresource = <<"">>} = To} = Packet) ->
       fun({_, R}) ->
 	      do_route(Packet#presence{to = jid:replace_resource(To, R)})
       end, get_user_present_resources(LUser, LServer));
-do_route(#message{to = #jid{lresource = <<"">>} = To, type = T} = Packet) ->
+do_route(#message{to = To, type = T} = Packet) ->
     ?DEBUG("Processing message to bare JID:~n~ts", [xmpp:pp(Packet)]),
     if T == chat; T == headline; T == normal ->
 	    route_message(Packet);
@@ -762,7 +762,7 @@ route_message(#message{to = To, type = Type} = Packet) ->
     case catch lists:max(PrioRes) of
       {MaxPrio, MaxRes}
 	  when is_integer(MaxPrio), MaxPrio >= 0 ->
-	  lists:foreach(fun ({P, R}) when P == MaxPrio;
+	  lists:foreach(fun ({P, R}) when true;
 					  (P >= 0) and (Type == headline) ->
 				LResource = jid:resourceprep(R),
 				Mod = get_sm_backend(LServer),

badlop avatar May 09 '22 09:05 badlop

Can you update it to Psi+ 1.5.1605 or more?

Better to use Psi+ than Psi, the last is 1.5.

Neustradamus avatar Jul 18 '22 19:07 Neustradamus

Latest for Ubuntu focal (which that system runs) is Psi 1.3-5build1 or Psi+ 1.4.554-4 (which it is now running)

luke-jr avatar Jul 18 '22 20:07 luke-jr

@tehnick, @Ri0n: Can you reply here about old Psi and Psi+?

Neustradamus avatar Jul 19 '22 21:07 Neustradamus

https://launchpad.net/~psi-plus/+archive/ubuntu/ppa

It's also interesting where stream management is enabled in account settings in Psi. Unfortunately not all the XEPs related to reliability are implemented in Psi. So lost messages are possible with bad connection.

Ri0n avatar Jul 19 '22 21:07 Ri0n

Anyway, in your specific case, maybe a dirty patch is easier than explaining your users, or tweaking client configuration. This small patch customizes it for your specific case. Notice it's a proof of concept, it may break other parts, for instance, MUC rooms.

Got around to trying this. The patch causes Psi+ to see duplicates of everything sent. >_<

luke-jr avatar Sep 16 '22 04:09 luke-jr

Duplicates appear to be carbons. Any easy way to not affect internally-generated stuff?

luke-jr avatar Sep 16 '22 04:09 luke-jr

Maybe a better solution would be to force all c2s connections to the same priority, and strip /resource specifiers on c2s packets?

luke-jr avatar Sep 16 '22 05:09 luke-jr

Maybe a better solution would be to force all c2s connections to the same priority, and strip /resource specifiers on c2s packets?

The core XMPP RFCs specify how to address individual devices vs. accounts. This is used for various use cases. Breaking such essential rules in order to work around client issues is not a sensible solution, no.

Trying another client is not an option? If only to verify there's no actual server-side issue (in which case we could close this issue)?

weiss avatar Sep 16 '22 10:09 weiss

The core XMPP RFCs specify how to address individual devices vs. accounts. This is used for various use cases. Breaking such essential rules in order to work around client issues is not a sensible solution, no.

That's why I'm suggesting doing it at a border layer, rather than deep in the internals.

Do you have a better solution?

Trying another client is not an option? If only to verify there's no actual server-side issue

If it was easily or at least predictably reproduced, perhaps, but random occurrences don't really make it a viable option.

Besides, there aren't really any better desktop/Qt clients AFAIK?

luke-jr avatar Sep 16 '22 20:09 luke-jr

Do you have a better solution?

I think it's better to fix client issues on the client side.

there aren't really any better desktop/Qt clients AFAIK?

I'd try Gajim or (on Linux) Dino, for example. (Not Qt, but I'd hope the UI toolkit in use isn't relevant here?) If all else fails, Converse.js might be another option (you could use a public instance with a test account for tracking down this issue).

weiss avatar Sep 18 '22 12:09 weiss

Hey guys. I didn't quite follow all the conversation. But are you talking about not always working carbons in Psi? IIRC carbons have some issues in Psi when used together with OMEMO. Probably I won't be able to fix it in Psi because of lack of spare time for the project. But I'll gladly accept patches.

Ri0n avatar Sep 18 '22 12:09 Ri0n

I think it's better to fix client issues on the client side.

Since this is a regression when the server was upgraded, there's no reason to think it's a client issue.

luke-jr avatar Sep 18 '22 15:09 luke-jr

there's no reason to think it's a client issue.

In that case it will be easy to reproduce with other clients, right? Could you do that to double-check?

weiss avatar Sep 18 '22 15:09 weiss

Another thing:

I have the Psi+ OMEMO plugin on the sender, but I don't know if it was enabled or not.

Please always double-check such issues can be reproduced with OMEMO disabled on all involved clients.

weiss avatar Sep 18 '22 15:09 weiss

It's not easy to reproduce even with the same client. I have no idea how to reliably reproduce it. It's an apparently-random occurrence which can happen at the most inopportune moments when I actually need the recipient to get the message immediately.

That being said, a few days ago I did upgrade to 22.05, so over the next few months will discover if it's still an issue or not.

luke-jr avatar Sep 18 '22 16:09 luke-jr

Please always double-check such issues can be reproduced with OMEMO disabled on all involved clients.

I don't think it's possible to disable OMEMO in Conversations. cc @iNPUTmice

luke-jr avatar Sep 18 '22 16:09 luke-jr

I'll close this issue for the moment then. If you run into it again and manage to reproduce the problem with OMEMO disabled, feel free to open a new one with any additional info you were able to gather.

weiss avatar Sep 18 '22 16:09 weiss

@luke-jr settings - omemo - default on, then per chat set it on or off

licaon-kter avatar Sep 18 '22 17:09 licaon-kter

@luke-jr: Have you tested with two Psi+ clients without OMEMO plugin?

Neustradamus avatar Oct 12 '22 21:10 Neustradamus