stream-chat-swift
stream-chat-swift copied to clipboard
Failing to update `localMessageState` results unwanted message reordering
NB: This bug report is based on my speculation and I don't have repro steps or a concrete proof that what I described below is exactly what happened.
We've recently encountered this issue again, so I had a look at the code to debug possible reasons. While browsing through the call stack, I found a suspicious code in MessageRepository.swift.
For whatever reason, database.write
(L53) fails, we will get into the error handling code (L58-61) which basically calls markMessageAsFailedToSend
to update localMessageState
to .sendingFailed
.
https://github.com/GetStream/stream-chat-swift/blob/cd24f3fdd40f9dea881830f56ce23ff9186ba802/Sources/StreamChat/Repositories/MessageRepository.swift#L52-L63
I see two issues here:
- The logic to update
localMessageState
works only when the old state is.sending
. Given we failed to update the state to.sending
, it won't work. https://github.com/GetStream/stream-chat-swift/blob/cd24f3fdd40f9dea881830f56ce23ff9186ba802/Sources/StreamChat/Repositories/MessageRepository.swift#L130-L132 - Even if the old state checking issue is somehow fixed, I'm not too sure it can successfully update the state to
.sendingFailed
because that's the samedatabase.write
operation which just failed.
Anyways, regardless of this issue, MessageSendingQueue
will remove the request once it's tried.
https://github.com/GetStream/stream-chat-swift/blob/cd24f3fdd40f9dea881830f56ce23ff9186ba802/Sources/StreamChat/Workers/Background/MessageSender.swift#L146-L148
The above code ignores the result _
because, I believe, it assume sendMessage
will either succeed or update localMessageState
to .sendingFailed
. Unfortunately, that assumption seems to be wrong.
If there are more requests in the send request queue, those requests could hit the backend earlier than the above failed request. Because the failed message's localMessageState
will stay as .pendingSend
, it will be retried later. This will eventually result message delaying and thus reordering.
cc @jfru
Hi @chan-reclip,
Thank you for the report. We'll have a look into it.
Couple of comments:
Which evidence do you have that support that you reach MessageRepository L57 with an error? Is it logs or is it an assumption in this case? Just asking because I am not sure based on the comment if you saw that happening or not. I think this would be the most important thing for us to try to help you 😄 If there is any evidence, could you tie this to a particular event happening in your app?
Regardless of the previous:
Even if the old state checking issue is somehow fixed, I'm not too sure it can successfully update the state to .sendingFailed because that's the same database.write operation which just failed
If the write fails, chances are that either the message does not exist, that it has been deleted, or that the DB is in a bad state. I am mostly inclined towards the latest as there is no throwing function that is called in that process that could justify an error being reported in L57.
regardless of this issue, MessageSendingQueue will remove the request once it's tried
Because the failed message's localMessageState will stay as .pendingSend, it will be retried later. This will eventually result message delaying and thus reordering
Based on the assumption that I stated above, if seems "safe" to assume that if the message was not retrieved in the previous step, there is no way to recover from it. So this is the reason why we are ignoring the error, and removing the message from the queue.
In any case, we are going to have a deeper look into it whenever possible.
Hi @polqf,
Thanks for the prompt response.
Which evidence do you have that support that you reach MessageRepository L57 with an error? Is it logs or is it an assumption in this case? Just asking because I am not sure based on the comment if you saw that happening or not. I think this would be the most important thing for us to try to help you 😄
Totally agreed but unfortunately I couldn't get any logs from the original reporter. 😞 As I mentioned at the top of description, it's purely based on my speculation. I'm just judging from my understanding of the related code, if we ever reach L57, the issue can be reproduced.
If the write fails, chances are that either the message does not exist, that it has been deleted, or that the DB is in a bad state. I am mostly inclined towards the latest as there is no throwing function that is called in that process that could justify an error being reported in L57.
Yes, I also think the message should exist as we check it just before the write. https://github.com/GetStream/stream-chat-swift/blob/cd24f3fdd40f9dea881830f56ce23ff9186ba802/Sources/StreamChat/Repositories/MessageRepository.swift#L29-L33
Based on the assumption that I stated above, if seems "safe" to assume that if the message was not retrieved in the previous step, there is no way to recover from it.
It's "safe" if message was not retrieved. But haven't you said you're mostly inclined to DB being in a bad state?
In any case, we are going to have a deeper look into it whenever possible.
I'll be waiting for it.
Cheers!
Hi @chan-reclip, we fixed a message reordering issue in 4.61.0. Please reopen if it ever happens again.
Thank you!