zulip-mobile
zulip-mobile copied to clipboard
Double-posted message
Occasionally when someone sends a message from the app, it ends up getting sent twice, creating a duplicate. Here's a recent example on chat.zulip.org:
(And the first message has a "13:01" timestamp -- so 16 minutes before the second one.)
@brainwane did report a similar issue during PyCon and I confirmed with her that she was using the web interface.
This does not mean we don't have the same issue in the mobile app though. Is it possible to be server-side?
Yes, we've confirmed this particular message was sent with the ZulipMobile client.
We had a rather spectacular demonstration of this bug today, provided by @armaanahluwalia :
An hour earlier @armaanahluwalia had reported being on 15.0.92 (the current latest prod release) on iOS, so that's presumably the environment where that happened.
We just had another report of this (in chat). The user was most likely on the latest version (which went out last week), 20.0.103.
We just had another report of a double-posted message, with some extra context that may be helpful: the user was "riding in a train with changing networks". This was on 21.2.106, the latest release.
One possibility is that we successfully sent the message to the server, but didn't hear the server's reply... and then a little later when the device reconnected to the network, we retried. Certainly that's something that's likely to happen on a spotty connection like on a train, and it's fundamentally unavoidable in general.
We do send the server a unique (*) "local message ID", which seems like basically what the server needs in order to prevent an issue like this. But I think the server doesn't actually use it for that purpose.
(*) We don't actually do a great job of picking a unique identifier here -- it's the timestamp of the Outbox
object's creation, in whole seconds -- but a failure in that direction would cause the opposite of this issue.
I think that's correct (the server doesn't use it for that purpose) -- I think because on web, we mostly only retransmit after errors after explicit user request. If we wanted to have the server handle this for us, we could add a database table for tracking such requests and return an "already sent" 200.
But even with the current server design, I think if the outbox logic only tried to resend things after ensuring it'd processed its event queue, this should be quite rate, since you'd need to have a network interruption longer than 10 minutes to have an issue (and it would become effectively never if we move to the event queues 2.0 system)
We do track a isSent
flag on each Outbox
message which we persist between app restarts.
Even before that we were removing the messages a bit too aggressively from the Outbox, on a successful POST
even before waiting for the event queue to confirm.
I am curious if the local_id
of the duplicate messages was the same?
See also #3584, a later report of this with the extra twist that the second post was nearly a week after the original.
Just had a case of this here, with a message I sent. (From v26.22.145 on Android 10.)
I was on a bus in San Francisco -- lots of cell coverage, but lots of devices talking, so it's unsurprising if the connection sometimes fails. As I experienced it, the sequence was:
- I hit send on the message.
- IIRC it takes a few seconds before the spinner disappears.
- When it does, a second copy of the message almost immediately appears too.
As the server saw it, the two copies of the message came 3 seconds apart. (Based on the detailed timestamps shown when I hover over the timestamps in the webapp.)
There's two different things going on here. With the mobile client, there's a thing where it double-posts on unreliable networks, because it doesn't get confirmation. But there's also a UI problem which bit me here: https://chat.fhir.org/#narrow/stream/179219-analytics-on-FHIR/topic/Contained.20resources.20out-of-scope.20for.20the.20initial.20milestone/near/407338972
I was using the mobile app, on a very slow link (airplane). I typed the message, and hit send. I thought I'd missed, there's no visual queue. So I did it again. Then I realised I wanted to change the message, and then pressed send, and realised... it was already sending. Twice, in fact.
very confusing UI! - and this is very specifically a UI problem unlike the unreliable network problem - if you send, nothing changes except for a very easy to miss circle icon. And then you can still edit and send. 😢
hit send. I thought I'd missed, there's no visual queue.
Interesting, thanks for the report.
The behavior I'd expect the app to currently have here is that when you hit send, your message shows up immediately in the message list. There's a small spinning circle at the right of the message to indicate that it hasn't yet finished sending… but other than that, it should look exactly like it will once it does successfully send.
Do I understand correctly that the message didn't show up like that? If so, then that's a bug.
I'm pretty sure it didn't show up like that
OK, thanks for confirming. Sounds like we have a previously-unknown bug where that outgoing message (or "local echo") doesn't always show up.
Because the existing app is in maintenance mode as we focus on building a new Zulip mobile app to replace it, I think it's unlikely that we'll debug to get to the bottom of that issue. Instead, when we build the corresponding "local echo" feature in the new app, we'll just aim to write it robustly to avoid this sort of issue — and if it develops such an issue anyway, we'll debug and fix it there.
We didn't previously have this "local echo" feature in the new app's tracker. So I've just filed https://github.com/zulip/zulip-flutter/issues/576 for that, and I used the report above as an example of why it's important.