status-desktop icon indicating copy to clipboard operation
status-desktop copied to clipboard

Messages that are sent from desktop when the client goes offline and before offline banner is shown will be lost (communities, "Relay mode")

Open churik opened this issue 2 years ago • 18 comments

Bug Report

Description

Messages that were sent from an offline state when the desktop client doesn't detect that the device is offline (when the banner appears, ~1-1,5 mins) will never be delivered. The issue is not reproducible for mobile.

Steps to reproduce

Prerequisites: 2 users have joined the community

  1. Desktop: switch off Wi-Fi, send several messages to the community (send messages until "Offline" banner appears)
  2. Desktop: send several messages after "Offline" banner appears (they will be shown with "Resend" option)
  3. Desktop: switch on Wi-fi
  4. Mobile: check community

https://github.com/status-im/status-desktop/assets/4557972/af0bdd66-6fd4-4c8a-ad1e-d4bcd93b4ac7

Expected behavior

all messages are delivered

Actual behavior

messages sent on step 1 will be lost

Additional Information

  • Status desktop version: RC3 0.15.x
  • Operating System: Mac OSx

Log: geth copy 2.log

churik avatar Nov 21 '23 12:11 churik

most likely due to relay not having message confirmations? if lightpush enabled, do you have the same? it should work ok in 1-to-1, since we resend until delivery confirmed

cammellos avatar Nov 21 '23 12:11 cammellos

This bug will be perceived by users as messaging reliability issue, and some solution is therefor high priority. I won't opinion on the solution here, but seems like something for 0.15.5 (or 0.16 at the latest)

John-44 avatar Nov 21 '23 13:11 John-44

This is something discussed before in waku calls, I believe there were some hesitation on their side, but I strongly believe it is something that we should add, we can take example from wakuv1 that reduced the toll on the network by only enabling if asked by the peer.

cammellos avatar Nov 21 '23 13:11 cammellos

ping @fryorcraken @richard-ramos @jm-clius - FYI in case you thoughts regarding the issue discussed above

John-44 avatar Nov 21 '23 13:11 John-44

@cammellos Light mode enabled on desktop or "Relay mode" in 1-1 chat: all messages are marked as "Resend" immediately app detects an offline state, and after coming back online messages are recent.

So yeah, the issue is specific to "Relay mode" (default on desktop) in communities

churik avatar Nov 21 '23 14:11 churik

Andrea:

one thing that is a bit odd, is that messages were not resent so I would start from investigating that basically if peer count is 0 the message should be resent in theory, but I don't know on relay, I know in lightpush on mobile it works if you at least do that, then the issue becomes smaller

jrainville avatar Nov 27 '23 15:11 jrainville

Thanks, we have seen this issue again but on a good connection

cammellos avatar Nov 27 '23 15:11 cammellos

We will propose strategies that can be used to mitigate the issue. https://github.com/waku-org/go-waku/issues/921 tracks this work.

fryorcraken avatar Nov 29 '23 01:11 fryorcraken

Thanks, we have seen this issue again but on a good connection

I understand why messages were not sent before we identify network is disconnected. But this seems strange, is this reproducible and do we have any further details when this has occured? Are any peers connected for the pubsubTopic? Any other useful info would help in root-causing the issue.

chaitanyaprem avatar Nov 29 '23 09:11 chaitanyaprem

Thanks, we have seen this issue again but on a good connection

I understand why messages were not sent before we identify network is disconnected. But this seems strange, is this reproducible and do we have any further details when this has occured? Are any peers connected for the pubsubTopic? Any other useful info would help in root-causing the issue.

Not sure we can replicate consistently, the connection was stable, but the peer count was 0, in this case the message should not get confirmed (I am not sure it's the case now to be honest, I can check I guess, that would alleviate the issue, but we'd still have the half open connection)

cammellos avatar Nov 29 '23 09:11 cammellos

Thanks, we have seen this issue again but on a good connection

I understand why messages were not sent before we identify network is disconnected. But this seems strange, is this reproducible and do we have any further details when this has occured? Are any peers connected for the pubsubTopic? Any other useful info would help in root-causing the issue.

Not sure we can replicate consistently, the connection was stable, but the peer count was 0, in this case the message should not get confirmed (I am not sure it's the case now to be honest, I can check I guess, that would alleviate the issue, but we'd still have the half open connection)

Relay publish should fail in such a scenario if it is initialized with minPeersToPublish value of more than 0. Is that handled to mark the message for retry? Unless it hit some kind of a corner case where during publish the peer connection is still considered active and then peer connection got dropped off/marked down (due to TCP delay in identifying the same).

chaitanyaprem avatar Nov 29 '23 09:11 chaitanyaprem

Unless it hit some kind of a corner case where during publish the peer connection is still considered active and then peer connection got dropped off/marked down (due to TCP delay in identifying the same).

This 100% happens, it's easy to replicate, cut off internet just before sending a message and you will get an half open connection, we had this problem on wakuv1 and that's why we implemented confirmations. On mobile happens very often, on desktop should happen less, but I guess depends on how stable your connection and peers are.

I have the suspicious though that we don't mark messages for resend if the peer count is 0, that I need to have a look and we can address, but the issue above will still be valid.

cammellos avatar Nov 29 '23 09:11 cammellos

This 100% happens, it's easy to replicate, cut off internet just before sending a message and you will get an half open connection, we had this problem on wakuv1 and that's why we implemented confirmations. On mobile happens very often, on desktop should happen less, but I guess depends on how stable your connection and peers are.

Yes, i have tested it myself today and agree that this is due to half open TCP or TCP not yet identifying network disconnection and messages remaining in send buffer. I have documented a draft approach to take care of such scenarios https://github.com/waku-org/go-waku/issues/921#issuecomment-1831564676. Do go through and give your feedback as well.

I have the suspicious though that we don't mark messages for resend if the peer count is 0, that I need to have a look and we can address, but the issue above will still be valid.

Ok, sure...would be awesome if you can also point me to the code where this is set.

chaitanyaprem avatar Nov 29 '23 10:11 chaitanyaprem

My two cents. Status side works correctly. But the p2p layer does not inform quick enough about the lost connection. The flow looks like this:

  1. Turn off connection
  2. We have >0 peers we can send messages
  3. After 25-30 seconds (on my machine) p2p ConnectionGater emits signal, we see the "no connection" banner, the number of peers become 0.
  4. Now we can send messages but they are marked as "Resend". When connection is back they will be resent correctly.

endulab avatar Dec 01 '23 09:12 endulab

After a discussion, we decided to implement an e2e syncing mechanism between peers (we can extend the datasync layer that does something similar albeit for 1-to-1 chats), and later we can re-evaluate whether confirmations from the next hop are still needed. I am currently working on it.

https://github.com/status-im/status-go/issues/4438

cammellos avatar Dec 08 '23 15:12 cammellos

@cammellos I see that the PR you linked above is merged. Should we close this issue?

jrainville avatar May 22 '24 18:05 jrainville

I don't think so, still valid, even if you have peersyncing enabled

cammellos avatar May 22 '24 19:05 cammellos

Work to fix this kind of issue has been transferred to Waku. This is around detection of disconnection in Waku Relay and feedback that messages were not sent.

Work is tracked with https://github.com/waku-org/pm/issues/184

fryorcraken avatar Jun 05 '24 00:06 fryorcraken

closing as per frank's message

iurimatias avatar Oct 30 '24 20:10 iurimatias