stream-chat-swift icon indicating copy to clipboard operation
stream-chat-swift copied to clipboard

Failing to update `localMessageState` results unwanted message reordering

Open chan-reclip opened this issue 1 year ago • 3 comments

NB: This bug report is based on my speculation and I don't have repro steps or a concrete proof that what I described below is exactly what happened.

We've recently encountered this issue again, so I had a look at the code to debug possible reasons. While browsing through the call stack, I found a suspicious code in MessageRepository.swift.

For whatever reason, database.write (L53) fails, we will get into the error handling code (L58-61) which basically calls markMessageAsFailedToSend to update localMessageState to .sendingFailed. https://github.com/GetStream/stream-chat-swift/blob/cd24f3fdd40f9dea881830f56ce23ff9186ba802/Sources/StreamChat/Repositories/MessageRepository.swift#L52-L63

I see two issues here:

  1. The logic to update localMessageState works only when the old state is .sending. Given we failed to update the state to .sending, it won't work. https://github.com/GetStream/stream-chat-swift/blob/cd24f3fdd40f9dea881830f56ce23ff9186ba802/Sources/StreamChat/Repositories/MessageRepository.swift#L130-L132
  2. Even if the old state checking issue is somehow fixed, I'm not too sure it can successfully update the state to .sendingFailed because that's the same database.write operation which just failed.

Anyways, regardless of this issue, MessageSendingQueue will remove the request once it's tried. https://github.com/GetStream/stream-chat-swift/blob/cd24f3fdd40f9dea881830f56ce23ff9186ba802/Sources/StreamChat/Workers/Background/MessageSender.swift#L146-L148 The above code ignores the result _ because, I believe, it assume sendMessage will either succeed or update localMessageState to .sendingFailed. Unfortunately, that assumption seems to be wrong.

If there are more requests in the send request queue, those requests could hit the backend earlier than the above failed request. Because the failed message's localMessageState will stay as .pendingSend, it will be retried later. This will eventually result message delaying and thus reordering.

chan-reclip avatar Jul 05 '23 05:07 chan-reclip

cc @jfru

chan-reclip avatar Jul 05 '23 05:07 chan-reclip

Hi @chan-reclip,

Thank you for the report. We'll have a look into it.

Couple of comments:

Which evidence do you have that support that you reach MessageRepository L57 with an error? Is it logs or is it an assumption in this case? Just asking because I am not sure based on the comment if you saw that happening or not. I think this would be the most important thing for us to try to help you 😄 If there is any evidence, could you tie this to a particular event happening in your app?

Regardless of the previous:

Even if the old state checking issue is somehow fixed, I'm not too sure it can successfully update the state to .sendingFailed because that's the same database.write operation which just failed

If the write fails, chances are that either the message does not exist, that it has been deleted, or that the DB is in a bad state. I am mostly inclined towards the latest as there is no throwing function that is called in that process that could justify an error being reported in L57.

regardless of this issue, MessageSendingQueue will remove the request once it's tried

Because the failed message's localMessageState will stay as .pendingSend, it will be retried later. This will eventually result message delaying and thus reordering

Based on the assumption that I stated above, if seems "safe" to assume that if the message was not retrieved in the previous step, there is no way to recover from it. So this is the reason why we are ignoring the error, and removing the message from the queue.

In any case, we are going to have a deeper look into it whenever possible.

polqf avatar Jul 05 '23 06:07 polqf

Hi @polqf,

Thanks for the prompt response.

Which evidence do you have that support that you reach MessageRepository L57 with an error? Is it logs or is it an assumption in this case? Just asking because I am not sure based on the comment if you saw that happening or not. I think this would be the most important thing for us to try to help you 😄

Totally agreed but unfortunately I couldn't get any logs from the original reporter. 😞 As I mentioned at the top of description, it's purely based on my speculation. I'm just judging from my understanding of the related code, if we ever reach L57, the issue can be reproduced.

If the write fails, chances are that either the message does not exist, that it has been deleted, or that the DB is in a bad state. I am mostly inclined towards the latest as there is no throwing function that is called in that process that could justify an error being reported in L57.

Yes, I also think the message should exist as we check it just before the write. https://github.com/GetStream/stream-chat-swift/blob/cd24f3fdd40f9dea881830f56ce23ff9186ba802/Sources/StreamChat/Repositories/MessageRepository.swift#L29-L33

Based on the assumption that I stated above, if seems "safe" to assume that if the message was not retrieved in the previous step, there is no way to recover from it.

It's "safe" if message was not retrieved. But haven't you said you're mostly inclined to DB being in a bad state?

In any case, we are going to have a deeper look into it whenever possible.

I'll be waiting for it.

Cheers!

chan-reclip avatar Jul 05 '23 12:07 chan-reclip

Hi @chan-reclip, we fixed a message reordering issue in 4.61.0. Please reopen if it ever happens again.

Thank you!

laevandus avatar Jul 31 '24 12:07 laevandus