repliear icon indicating copy to clipboard operation
repliear copied to clipboard

convert Repliear to use the row versioning strategy

Open tantaman opened this issue 1 year ago • 15 comments

Based on todo-row-versioning. This won't diff well since it re-structured the app and switched to Vite.

Most relevant files:

Meta

Server

Note that past CVRs contribute to the current CVR. I.e., all data the client currently has is represented by all data we've ever sent. If any CVR is dropped we'll just re-fetch the rows that the CVR used to have.

Client

Note that I'm currently only allowing one tab to pull at once. This is explained in pull.ts in the comment above dropCVREntries. This is something I could fix if we want to keep the behavior of allowing tabs to sync the same data at the same time.

tantaman avatar Nov 13 '23 21:11 tantaman

I wonder how much you thought about its performance as it grows more and more spread out. Does it ever need to be compacted?

I believe that SELECT rowid, rowversion WHERE NOT IN (...) will slow down as the CVR gets larger. I don't know by how much or where the limit is so that is something that:

a. I can dig into more or b. we can ignore for now since we're going to replace this "sync the world" with UI driven sync which may end up changing the type of query we issue

side note: I was very surprised that WHERE NOT IN (...) performed better than WHERE NOT EXISTS (...).

Update: EXCLUDE is probably the best option here.

tantaman avatar Nov 16 '23 14:11 tantaman

On Thu, Nov 16, 2023 at 4:54 AM Matt Wonlaw @.***> wrote:

@.**** commented on this pull request.

In server/src/pull/pull.ts https://github.com/rocicorp/repliear/pull/158#discussion_r1395817001:

  •    clientGroupID,
    
  •  );
    
  •  const prevCVR = pull.cookie
    
  •    ? await getCVR(executor, pull.cookie.clientGroupID, pull.cookie.order)
    
  •    : undefined;
    
  •  const baseCVR: CVR = prevCVR ?? {
    
  •    clientGroupID,
    
  •    clientVersion: 0,
    
  •    order: 0,
    
  •  };
    
  •  // Drop any cvr entries greater than the order we received.
    
  •  // Getting an old order from the client means that the client is possibly missing the data
    
  •  // from future orders.
    
  •  //
    
  •  // The other possbility is that multiple tabs are syncing (why are tabs not coordinating who does sync?)
    

I thought you'd have to do leader election for my work on cr-sqlite in the browser as well but it seems like WebLocks just solve this? Have each tab try to acquire the lock as its first operation, once the lock is acquired that tab starts to sync. When tabs die, this'll free the lock for other tabs to take over sync.

We spent a ton of time trying to decide if we could trust web locks and couldn't convince ourselves.

Here are some starting points:

https://x.com/aboodman/status/1549590251558936577?s=20 @.*** in that thread is ex-mozilla) https://github.com/w3c/web-locks/issues/81

I would love to be convinced, but there are so many weird edge cases in browsers around bfcache, frozen states – especially in mobile where it's even harder to test, that it's very hard to feel confident – particularly since the spec doesn't even say what should happen yet.

Everytime we get anywhere near here I just give up and decide it's easier to engineer something that works right without locks than try to understand if these things work across all the browsers and platforms we want to support.

Luckily, I don't think web locks are necessary here -- I believe your big-picture design does work without them: https://github.com/rocicorp/repliear/pull/158#discussion_r1395524886

Am I missing something?

The only question is if a tab can be put to sleep while holding a lock. I haven't seen this in practice and that sounds like bad design if browsers could sleep tabs that hold shared resources.

— Reply to this email directly, view it on GitHub https://github.com/rocicorp/repliear/pull/158#discussion_r1395817001, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAATUBB3HYFTF2ASXKHMBHDYEYRALAVCNFSM6AAAAAA7JZ5BI2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTOMZUGU4DCMZYGY . You are receiving this because you commented.Message ID: @.***>

aboodman avatar Nov 17 '23 01:11 aboodman

Am I missing something?

No, I don't think so. You've definitely done more research on weblocks than myself.

tantaman avatar Nov 20 '23 17:11 tantaman

Given all the new updates, ~~are we fine where things stand and ready to merge?~~

EDIT: Given we're removing coordination (weblocks) of pulls from the client (and the visibility trick doesn't work if two tabs are visible https://github.com/rocicorp/repliear/pull/158#discussion_r1410973330) I need to revise the CVR approach. So no, we're not ready to merge until I commit the CVR updates to solve this problem on the server.

un-addressed items:

Yeah, I agree this is a problem now. A design for the cvr_entry table that would be "compatible" with the current design of Replicache would be one where older rows are retained instead of mutated in place. https://github.com/rocicorp/repliear/pull/158#discussion_r1407019002

~~The inefficiency is addressed by not pulling from a tab that is inactive. "not mutating in place" is not addressed since there's other downsides there. Mainly the client view becoming unbounded in size.~~

EDIT: the visibility trick doesn't work since the user might have both tabs visible at once. The revised CVR approach will not mutate data in place in order to solve the sync problem.

Why build up these two huge arrays

Replied in-line: https://github.com/rocicorp/repliear/pull/158#discussion_r1410840371

I think with your system here we could actually implement the client records using cvr too, which would simplify things! https://github.com/rocicorp/repliear/pull/158#discussion_r1407002841

Didn't do this / waiting clarification. I think we could also merge and do this later.

tantaman avatar Nov 30 '23 17:11 tantaman

https://github.com/rocicorp/repliear/pull/158/commits/9f1cb550ade7667dc30d66c4f575494150f69342 fixes things so many tabs can make progress if they're all trying to sync at once.

At this point it warrants some proper integration tests and compaction (cvrs now grow without bound).

The basic idea is that we no longer drop client_views nor mutate them in place. Each client view refers to its parent and we use that linked list of client views to construct the full view. If tabs branch off they get their own view that is not interfered with by other tabs.

image

tantaman avatar Nov 30 '23 21:11 tantaman

https://github.com/rocicorp/repliear/commit/9f1cb550ade7667dc30d66c4f575494150f69342 fixes things so many tabs can make progress if they're all trying to sync at once.

OK, I want to understand why it didn't work without this. Going to try and load it into my brain today.

aboodman avatar Nov 30 '23 23:11 aboodman

9f1cb55 https://github.com/rocicorp/repliear/commit/9f1cb550ade7667dc30d66c4f575494150f69342 fixes things so many tabs can make progress if they're all trying to sync at once.

Thank you for doing this, but let's please pause and figure out the design of client_view_entry first. I've grown to really like your original design where it's mutated in-place and doesn't grow. If we do do an immutable design I think there's a more relational-friendly design that's possible and doesn't require compaction.

Greg and I are meeting tomorrow midday PST to figure out our opinion and will share it after that. Maybe we could even all meet together.

In the meantime...

I think with your system here we could actually implement the client records using cvr too, which would simplify things! #158 (comment) https://github.com/rocicorp/repliear/pull/158#discussion_r1407002841

Didn't do this / waiting clarification. I think we could also merge and do this later.

Can you look into this? It seems orthogonal to above. I think it's likely Greg may come back with some feedback tomorrow too.

And one last thing:

The original Repliear implementation created a unique "space" for each visitor so that they didn't interfere with each others' writes. You can see this if you visit production:

[image: CleanShot 2023-11-30 at @.***

This was a large amount of complexity both inthe code and schema though. Perhaps we should do something simpler like reset the data every 24h. This could be done after merge, but registering it here as a TODO.

On Thu, Nov 30, 2023 at 11:58 AM Matt Wonlaw @.***> wrote:

9f1cb55 https://github.com/rocicorp/repliear/commit/9f1cb550ade7667dc30d66c4f575494150f69342 fixes things so many tabs can make progress if they're all trying to sync at once.

At this point it warrants some proper integration tests and compaction (cvrs now grow without bound).

The basic idea is that we no longer drop client_views nor mutate them in place. Each client view refers to its parent and we use that linked list of client views to construct the full view. If tabs branch off they get their own view that is not interfered with by other tabs.

image.png (view on web) https://github.com/rocicorp/repliear/assets/1009003/d942dfed-0871-4bcf-9f9f-6265e207c2e7

— Reply to this email directly, view it on GitHub https://github.com/rocicorp/repliear/pull/158#issuecomment-1834627680, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAATUBFXT2GRVTQIFA2UH7DYHD6OZAVCNFSM6AAAAAA7JZ5BI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZUGYZDONRYGA . You are receiving this because you commented.Message ID: @.***>

aboodman avatar Dec 01 '23 06:12 aboodman

On Thu, Nov 30, 2023 at 8:08 PM Aaron Boodman @.***> wrote:

If we do do an immutable design I think there's a more relational-friendly design that's possible and doesn't require compaction.

To avoid leaving the begged question hanging, here's what i was thinking:

client_view_entry

client_group_id string entity(enum) entity_id string entity_version integer valid_from_client_view_version integer valid_to_client_view_version nullable integer

To read the current client view for a cg: select entity_id, entity_version from client_view_entry where id = $client_group_id and valid_from_client_view_version >= $client_group_version and coalesce(valid_to_client_view_version, $client_group_version) <= $client_group_version

To update the client view for a cg (handwaving)

  • Pick the new client_view version
  • For each entity row either not in the client view or whose version is higher in entity row:
  • If there is an existing valid row for that entity, set its valid_to_client_view_version to the previous client_view version
  • Insert a new client_view_entry row with valid_to_client_view_version = NULL

(this idea basically stolen from "temporal tables" but using valid_from/to_version rather than valid_from/to_timestamp)

Greg and I are meeting tomorrow midday PST to figure out our opinion and will share it after that. Maybe we could even all meet together.

In the meantime...

I think with your system here we could actually implement the client records using cvr too, which would simplify things! #158 (comment) https://github.com/rocicorp/repliear/pull/158#discussion_r1407002841

Didn't do this / waiting clarification. I think we could also merge and do this later.

Can you look into this? It seems orthogonal to above. I think it's likely Greg may come back with some feedback tomorrow too.

And one last thing:

The original Repliear implementation created a unique "space" for each visitor so that they didn't interfere with each others' writes. You can see this if you visit production:

[image: CleanShot 2023-11-30 at @.***

This was a large amount of complexity both inthe code and schema though. Perhaps we should do something simpler like reset the data every 24h. This could be done after merge, but registering it here as a TODO.

On Thu, Nov 30, 2023 at 11:58 AM Matt Wonlaw @.***> wrote:

9f1cb55 https://github.com/rocicorp/repliear/commit/9f1cb550ade7667dc30d66c4f575494150f69342 fixes things so many tabs can make progress if they're all trying to sync at once.

At this point it warrants some proper integration tests and compaction (cvrs now grow without bound).

The basic idea is that we no longer drop client_views nor mutate them in place. Each client view refers to its parent and we use that linked list of client views to construct the full view. If tabs branch off they get their own view that is not interfered with by other tabs.

image.png (view on web) https://github.com/rocicorp/repliear/assets/1009003/d942dfed-0871-4bcf-9f9f-6265e207c2e7

— Reply to this email directly, view it on GitHub https://github.com/rocicorp/repliear/pull/158#issuecomment-1834627680, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAATUBFXT2GRVTQIFA2UH7DYHD6OZAVCNFSM6AAAAAA7JZ5BI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZUGYZDONRYGA . You are receiving this because you commented.Message ID: @.***>

aboodman avatar Dec 01 '23 06:12 aboodman

Greg and I are meeting tomorrow midday PST to figure out our opinion and will share it after that. Maybe we could even all meet together.

Sure. I should be available if it's before 5pm EST.

tantaman avatar Dec 01 '23 13:12 tantaman

@tantaman - remind me why we have both https://github.com/rocicorp/repliear/blob/3bb93995c0cc2e06dbc0e1e6e044d6c7c0464b14/server/src/schema.ts#L26 and https://github.com/rocicorp/repliear/blob/3bb93995c0cc2e06dbc0e1e6e044d6c7c0464b14/server/src/schema.ts#L75? It seems like they move together. It seems like we only need the one on client_view.

client_view has a 1:1 relationship with replicache_client_group right? The only real reason for client_view right now at all is because we need to track the client_version that was last sent to this group. If we can remove that, then this whole table can go away.

But as long as we do have the table, then it seems like we can track the latest version for the client_view on its row, rather than on the somewhat unrelated replicache_client_group.

aboodman avatar Dec 06 '23 21:12 aboodman

@tantaman - remind me why we have both

Basically it was from my ignorance when starting the conversion. replicache_client_group existed in todo-row-versioning and I wanted to minimize the moving pieces until everything was working.

I think we can consolidate them now. The rest of the updates we talked about are about done. Writing some tests and tracking down one last bug.

tantaman avatar Dec 07 '23 21:12 tantaman

this https://github.com/rocicorp/repliear/pull/158/commits/cfefc45b43ca6a2c4050c967ced9f0748f8fe6e9 has everything we talked about + tests

  • fast forwards when receiving an old cookie
  • handles deletes and re-insertions of previously deleted rows
  • pulls all data that may have been updated or deleted since the last pull in order to preserve consistency semantics for mutations

LMK if you'd rather this be a new PR stacked on top of the old PR. That is my preferred workflow (small, stacked, atomic PRs that are easily reviewed) although github doesn't seem to support it well.

I did see some weird behavior with experimentalWatch and 4 tabs syncing at once. You can read more in the commit message here: https://github.com/rocicorp/repliear/pull/158/commits/a3a11ac7809bae35b83966186083812dee1d6100

tantaman avatar Dec 08 '23 21:12 tantaman

Thanks @tantaman - I reviewed this and overall it looks very great. I'm so stoked to get this landed.

There are a bunch of tiny comments I have though and I think it would make more sense for me to take it from here for two reasons - (a) it's going to be more efficient for to just make all the tiny changes, (b) it's hard to review a change this size really thoroughly in code review. I will feel more confident if I've "had my hands on it" so to speak.

So I think I'll just take the PR over and add the finishing touches. Is that OK with you?

aboodman avatar Dec 11 '23 10:12 aboodman

You could move on to integrating materialite.

aboodman avatar Dec 11 '23 10:12 aboodman

So I think I'll just take the PR over and add the finishing touches. Is that OK with you?

Yep, that's fine with me.

Yeah, big PR's are a big pain to deal with. Hopefully we can keep them small for future work.

tantaman avatar Dec 11 '23 13:12 tantaman