status-go icon indicating copy to clipboard operation
status-go copied to clipboard

bug: message stuck in sending status for a long time

Open chaitanyaprem opened this issue 1 year ago • 8 comments

Problem

Have noticed this multiple times and even during dogfooding session, a message that is sent whose status is still shown as sending. Eventhough the message is received by other side in 1:1 chat, group chat and in this case community.

Attaching the logs for reference. MessageID of the message is 0xf37863ffcb2fcc6d7d05653a25db2ef4d808fa4e1a6cf584b5ddc4587ff82648

Even after 5 minutes, message is still shown as sending status.

Acceptance Criteria

Message should not be showing sending status for soo long especially if it is already delivered.

Notes

image

geth.log.gz

chaitanyaprem avatar Jul 01 '24 05:07 chaitanyaprem

Uploading more logs @kaichaosun geth.log.gz

chaitanyaprem avatar Jul 01 '24 06:07 chaitanyaprem

The timeline for message 0xf37863ffcb2fcc6d7d05653a25db2ef4d808fa4e1a6cf584b5ddc4587ff82648,

  • sent out at ~11:04~ 11:01:22
  • store hash query at ~11:07~ 11:04:33
  • the success of query result at ~11:11~ 11:07:19
  • EventEnvelopeSent triggered at ~11:11~ 11:07:19

The period of ~3s~ 3minutes between send out and send query request is ~expected~ unexpected, and it should take up to 6 seconds. The period of ~4s~ 3minutes between send query request and get request response is also too long and unexpected.

~But it only takes around 7 seconds~, do you mean it's still showing sending after 5 minutes for this message or is there other message for this long delay? @chaitanyaprem

kaichaosun avatar Jul 01 '24 06:07 kaichaosun

But it only takes around 7 seconds, do you mean it's still showing sending after 5 minutes for this message or is there other message for this long delay? @chaitanyaprem

It was for this message, screenshot was taken after 5 minutes of sending the message around local time 11:06AM and it was still showing as sending. I did not notice when status has finally changed, but now (after an hour and half) i don't see sending any more.

chaitanyaprem avatar Jul 01 '24 07:07 chaitanyaprem

It seems the store request takes really long, also the relay may also takes long since it's adding the message hash for query only after relay publish.

The request ID for hash query is 0x65d92cd709449fabc5f9681e38906f213556bf2572f673431ff1fb51a0a9b1be, @Ivansete-status you may know how to get the timeline of the query in nwaku server?

@chaitanyaprem It looks like the network connection can be the cause of the issue, or maybe I'm looking into the wrong branch, where can I find the version you are using for this test?

kaichaosun avatar Jul 01 '24 07:07 kaichaosun

I find the some info in elastic, here

image

It takes pretty long to get the response.

kaichaosun avatar Jul 01 '24 07:07 kaichaosun

I find the some info in elastic, here image

It takes pretty long to get the response.

Wow....that is a long time. cc @Ivansete-status store query took close to 3 mins.

chaitanyaprem avatar Jul 02 '24 04:07 chaitanyaprem

I am also wondering what is the behaviour for the following scenario? The reason for this query is we have observed this behaviour even in case of 1:1 chat during our dogfooding session.

What if userA sends a message to UserB in 1:1 chat and mvds also acks receipt of message by UserB. Will we still depend on store-query to change sending status to sent at UserA's side? If so, then we need to fix it.

Ideally status can be changed by any sort of ack either by mvds, store-query or e2e protocol that is being built at a later point in time.

chaitanyaprem avatar Jul 02 '24 04:07 chaitanyaprem

Will we still depend on store-query to change sending status to sent at UserA's side? If so, then we need to fix it.

No, the ack message will also change the state from outging (or sending) to sent.

Ideally status can be changed by any sort of ack either by mvds, store-query or e2e protocol that is being built at a later point in time.

This is a good idea, we need to encapsulate the interface for better usage in different reliability solutions.

kaichaosun avatar Jul 02 '24 06:07 kaichaosun

Ideally status can be changed by any sort of ack either by mvds, store-query or e2e protocol that is being built at a later point in time.

This is a good idea, we need to encapsulate the interface for better usage in different reliability solutions.

Are we tracking this somewhere?

fryorcraken avatar Jul 12 '24 04:07 fryorcraken

@fryorcraken not yet, it could be one of the requirements when implementing e2e reliability. I believe it may include a lot of refactors within status-go, for example mvds. Or we can make e2e reliability and mvds both in waku-sdk.

kaichaosun avatar Jul 12 '24 08:07 kaichaosun

As it can be seen in the description of the following issue, the messageHash queries are taking more than 5 minutes, on average (see the second query:) https://github.com/waku-org/nwaku/issues/2895

@kaichaosun , @chaitanyaprem - ~~is this happening only with store-v3?~~ edited: I confirmed that only happens from store-v3

Ivansete-status avatar Jul 18 '24 08:07 Ivansete-status

This is fixed with the improvements in nwaku.

kaichaosun avatar Aug 02 '24 06:08 kaichaosun

Hey @Ivansete-status I noticed there are store request takes more than 30s, is it expected?

kaichaosun avatar Aug 05 '24 06:08 kaichaosun

Hi @kaichaosun ! Yes, we have right now queries that take some minutes. That will be fixed when https://github.com/waku-org/nwaku/issues/2895 is deployed ( this month , August )

Ivansete-status avatar Aug 08 '24 16:08 Ivansete-status

@chaitanyaprem - a new version with a fix candidate has been deployed in status.prod. Kindly check whether the issue is sorted out. Thanks!

Ivansete-status avatar Aug 12 '24 14:08 Ivansete-status