element-x-android
element-x-android copied to clipboard
History missing in EXA public room
Steps to reproduce
the history was there, but when I left and came back to the app I watched it vanish
https://matrix.to/#/!aCvOWKNDeXXpPgquKq:matrix.org/$m-bhJJ48WIdjFMnEdDB50_hP5hpsqf5T3uTv3V-wP58?via=matrix.org&via=envs.net&via=element.io
Outcome
What did you expect?
History should never spuriously vanish
Your phone model
No response
Operating system version
No response
Application version and app store
playstore 0.1.2 (40001020)
Homeserver
pintobyte.com
Will you send logs?
Yes
Are you willing to provide a PR?
No
Removing the needs rust label until we know enough to create an upstream issue.
Assigned to me to check rageshakes
I checked rageshakes we received. We have 2 persons experiencing problems on their own homeserver & SS proxy. No-one is complaining on matrix.org since we fixed problems initially reported on iOS in https://github.com/vector-im/element-x-ios/issues/1197. This issue might be a server version problem.
We are aware that the workaround we made in SDK can generate false negative limited but we have not seen problem yet. The true now fix is to implement limited on the proxy.
I am deprioritising the issue. We can really work on such issue by checking both backend and client logs.
Ran into the issue on 0.2.0, custom homeserver too. I sent a bug report from EXA, hopefully you can connect it here. Faced the same on the version in Play Store at the time of writing (0.1.6, I think)
I opened the app right around 18:55 in the timezone on the log file: ~~ss-proxy.log~~
Unfortunately it's only INFO level, I'll raise the verbosity and see if I get the issue again and would be able to get the SS proxy logs. Proxy version 0.99.10
The last time I faced the issue on the older version, I think I also had posted something and someone replied, but didn't react to it at that time.
I had notifications turned off for the room at this time, if that makes a difference.
Edit: sorry for the just about unreadable log file with ANSI color escapes. Here's a fixed one:
I've seen multiple versions of this bug when I was testing on Conduit.
- Some messages of the history are missing, probably a single batch, but it continues normally after that
- I can only see a few most recent events and the history before that doesn't load at all, it's just white.
I can reproduce number 2 on both conduit and matrix.org by
- fully closing the app
- send two messages to my element x account in a DM
- clicking on the notification to start the app again
An idea raised when discussing this problem on matrix with @bnjbvr :
Can it be a late keys decryption issue? Like in this flow:
- on app resume,
/syncreturns some messages but not the corresponding new keys - EAX decides to not display them as UTDs for an unknown reason
- keys arrive
- Messages are decrypted in the SDK
- BUT EAX does not update the timeline with decrypted content. Maybe it updates the timeline items but they stay hidden
We start thinking this is an application problem because iOS do not have this issue.
[edit: In my specific instance of this problem, I am 100% positive this is] Not a late decryption: there were messages that I had seen in the room before. I background the app. I receive a notification from the app for new messages in this room. I click the notification, thus enter the room. After entering the room, some messages I had seen before backgrounding are now missing.
If I get back to the room list, then into the room again, I can see those messages perfectly fine.
Here's how I can reproduce the bug number 2 I mentioned. It does not work every time:
https://github.com/vector-im/element-x-android/assets/25297359/4df0cc1e-49bb-4c73-9ce9-0fe8ec58a3de
@timokoesters Thanks for the repro. Is that with a Synapse + sliding sync proxy server?
@bnjbvr Yes, the recording is from a matrix.org account
Cool, thanks for confirming. Also concurrently I did reproduce the issue with my Synapse + sliding sync proxy instances, using your STRs. Thanks a bunch :pray:
From watching lots of logs, I suspect the following happens:
- a back-pagination may start in the room we're looking at
- sliding sync starts, or was running in the background
- server returns 1 event aka the latest message) that the SDK wasn't aware of; the sliding sync proxy is right, here, since we knew about all the previous messages
- client-side
limitedcomputation concludes it may be alimitedtimeline: there's only one event in the response that it doesn't know about - that in turn clears all the timeline items while the timline is being back-paginated :boom:
With respect to the progression of the back pagination, the clearing of the timeline may happen before/after/in the middle of pagination, likely resulting in missing messages.
cc @jplatte
That sounds realistic. So I think we have to cancel the ongoing back-pagination, if any, when the timeline is reset, right?
If you click on a notification, you probably want to see the message context and not cancel backpropagation @jplatte
Would it be possible to recover from the limited timeline instead of reconstructing it? Or alternatively to reconstruct it in the background so the user doesn't notice.
@timokoesters eventually, we will be able to handle limited server responses more gracefully than resetting the timeline. There are plans, but they're not simple because we're dedicated to doing things right.
This was supposed to be fixed by https://github.com/matrix-org/matrix-rust-sdk/pull/2638, but apparently the issue still exists. Do we still think this is a Rust only issue? Is there anything we can do to help investigating this issue?
A new rageshake from a room where the problem happened, one that I'm in or that is public, might be helpful. Also wouldn't hurt to add matrix_sdk_ui::timeline=trace to the tracing filter. Currently having a little bit of trouble finding the event IDs for events involved in previous rageshakes.
Is this related to https://github.com/vector-im/element-x-android/issues/1281? I posted a sliding sync server log in there if helpful.
I initially thought so, but https://github.com/matrix-org/matrix-rust-sdk/pull/2638 should have fixed this if that was the case.
Fixed in EXA 0.2.4
Unfortunately I've experienced this or something very similar a few times today on https://github.com/vector-im/element-x-android/commit/4a7b40fe175eaabbea8cc020de842a79c06f5fc8 (debug build from GHA), which was pushed after releasing 0.2.4. I'll start submitting rageshakes for this again; by now I'm used to force stopping Element and restarting it to see missing messages. 😓
Seems like multiple people are still seeing timeline gaps 🙁
If anyone is seeing this issue with the latest nightly, please rageshake as we need new logs to investigate further
Additionally, it would be helpful if bug reports included the event IDs of the last event before the gap + first event after the gap. (for public rooms, a screenshot also works)
I just experienced this issue again and submitted a rageshake with event IDs
This should not happen anymore. Reopen if it is not the case.
Still happens to me in non-public rooms on 0.4.7.
Unless this was supposedly fixed between 0.4.7 and latest develop, can somebody reopen?
@Xiretza Would you have any steps to reproduce the issue?
Not really, unfortunately, it just happens seemingly randomly every once in a while. Might have something to do with my slow-as-molasses sliding sync proxy.
When it happens, the very latest message is (edit: sometimes) visible, then a whole chunk is missing, then history continues as normal.