element-web icon indicating copy to clipboard operation
element-web copied to clipboard

Ability to scrollback can get stuck

Open ara4n opened this issue 2 years ago • 11 comments

In an encrypted room, it's possible for scrollback to stop loading, despite you not being at the beginning of the room. A workaround is to click on the timestamp of a message towards the top of your timeline in order to permalink it, which causes it start back-paginating again. However, it obviously shouldn't stop refusing to scrollback in the first place.

ara4n avatar Aug 01 '21 22:08 ara4n

I don't think I am able to repro. Do you have a repro case? Have you seen this before or is this a regression?

SimonBrandner avatar Aug 02 '21 11:08 SimonBrandner

i've seen it a few times before; i think it is a regression (but could have regressed months ago). will try to repro.

ara4n avatar Aug 02 '21 21:08 ara4n

https://user-images.githubusercontent.com/1294269/127926353-13b7e6d9-12cb-4067-9b4a-c1c3305a08b7.mov

video of me reproing it. i'm afraid it's in an internal room, so i've had to crop the video comically, but suffice it to say that i'm leaning on page-up in order to rapidly backpaginate, and then when it gets stopped, i have to click on the permalink for the first message in the room for it to let me continue backpaginating again. will rageshake too.

ara4n avatar Aug 02 '21 21:08 ara4n

n.b. there is no spinner when it gets stuck. you just have a big hunk of vertical blank margin which you can't scroll beyond.

ara4n avatar Aug 02 '21 21:08 ara4n

I wonder, this only happens in this room or also other ones? I attempted to repro in a few rooms but I didn't manage to do it

SimonBrandner avatar Aug 03 '21 05:08 SimonBrandner

have seen it in many rooms; basically reproducable in any encrypted room if you start scrolling back aggressively.

ara4n avatar Sep 23 '21 10:09 ara4n

Also happens on desktop 1.9.5 (happened in prior versions as well), same problem as stuck.scrollback.mov, one interesting thing is if i manage to get to an older message using search, while scrolling down it will skip the section that was stuck entirely, messages left there are completely unreadable. This does not happen on the android app and the room history is readable as expected.

ghost avatar Nov 22 '21 19:11 ghost

I've seen this happen recently on develop.element.io

kittykat avatar Dec 08 '21 12:12 kittykat

Same on my side...

https://user-images.githubusercontent.com/15554561/150768054-c90af980-9839-4854-9520-dca0682f82f6.mp4

waclaw66 avatar Jan 24 '22 10:01 waclaw66

It stops loading the history before these events, maybe there is some coincidence...

obrazek

waclaw66 avatar Jan 27 '22 06:01 waclaw66

I just repro'd this again, simply by hammering page-up in #NVI. It had a "you do not have permission to view encrypted messages before this point." warning at the top of the page. clicking a permalink unstuck it.

ara4n avatar Sep 17 '22 14:09 ara4n

I think I figured out what this is:

When you're paginating back through a room and you hit a reply event that's outside of your current timeline set, the client will fetch the replied to event with the /context endpoint with lazy_load_members: true. If the resulting response does not overlap with any of the existing timelines in the timelineSet of the room, it's added to the room as a new timeline in the set, with the state initialized from the context response: https://github.com/matrix-org/matrix-js-sdk/blob/develop/src/client.ts#L5433

Then, if you continue to paginate back in the room, eventually you'll have a /messages response that overlaps with the old /context response, and the timelines will be linked together. However, the timeline created from the /context response will still just have the partial room state returned due to the lazy_load_members: true filter.

This becomes a problem when we call checkForPreJoinUISI in the TimelinePanel: https://github.com/matrix-org/matrix-react-sdk/blob/ff59f68a9fa0ac5bc467afe0e8d9af02d15e1666/src/components/structures/TimelinePanel.tsx#L1504

This function scans through the timline and tries to find when the current logged in user first joined the room by looking for the absense of a join event in the timeline. If we find a decryption error in the timeline before we have a membership for ourself in the timeline, we assume that this means that we're about to show the user a wall of decryption errors, as we've paginated into a portion of the rooom history that pre-dated us joining the room. We artificially stop paginating the room at this point, and only show events after we've "joined" the room.

The problem is, this timeline created by the /context endpoint doesn't necessarily have all the members in the room, and if you didn't say anything during this period, the /context endpoint won't include your membership event. This means that if there's a decryption error around this point, and we didn't say anything in the room around this point, checkForPreJoinUISI will wrongly calculate that you're not in the room at this point and stop pagination at this point.

The fix for this isn't super clear to me, hence the huge comment and not a PR.

We could:

  1. Try to fix up the timeline state when first creating a timeline from the /context endpoint by populating all the members from somewhere, but this seems to defeat the purpose of lazy_load_members
  2. Try to fix up the timeline state when we join the timeline created from the /messages endpoint with the timeline of the /context endpoint, but this seems pretty error prone to merge the state of the timeline from two sources
  3. Don't join timelines from different "sources" (/messages vs /context) together, and if you hit an overlap throw one of them out? Not sure how messy this is going to end up, sounds potentially weird to throw away timelines?
  4. Add some kind of property to timelines to indicate their states have lazy loaded members and shouldn't be trusted? This seems promising...
  5. Alter checkForPreJoinUISI to not use the state events in the timeline to figure out who's in the room, but this seems like it's just covering up that we don't know if timeline's have correct states or not.

Other ideas? Especially from folks that are more familiar with the timeline code than I?

https://github.com/matrix-org/matrix-react-sdk/pull/3881 is the PR that introduced this pagination handling in encrypted rooms, for context.

bradtgmurray avatar Sep 30 '22 21:09 bradtgmurray

@bradtgmurray thanks for tracking this one down :) My bias would be towards option 3 - the timeline created by /context in order to view a reply could definitely be thrown away in favour of the more detailed timeline from /messages when it's returned. we could flag the /context timeline as being dispensable (and not to bother trying to merge with it).

alternatively, option 4 could work too, and have the checkForPreJoinUISI just ignore timelines where the members haven't been loaded.

ara4n avatar Oct 02 '22 23:10 ara4n