atlasdb icon indicating copy to clipboard operation
atlasdb copied to clipboard

[WIP] [DNM] SnapshotReader Prototype

Open jeremyk-91 opened this issue 1 year ago • 2 comments

A rough idea of what this will look like.

It's still a work in progress. I think it'll be its own chain when it's time to merge this down - we should discuss if this still makes sense. Basically for simple gets this is already fairly involved, and for range scans it's going to be probably pretty hard and big. I won't be surprised if this reaches a delta of 4000 or 5000 in its raw form.

I think this undertaking makes sense if we're blocked without it, but only if that's the case.

Some things:

  • The code is a bit of a cluster because of the PathTypeTracker work that tries to enforce a hard separation between sync and async gets, I believe because no KVS actually implements async get properly (!!). For now I'll operate with keeping this, but I'm not convinced this actually needs to be kept.
  • Actual code will be larger in terms of adding code, because we need to introduce a few abstractions.

27 Feb

  • In this first version (Feb 27) I have only moved get/getAsync() so a bunch of things still live in two places right now. Once we move the range scans etc. out there should be more deletions / the net change should not be as large. The abstract tests for SnapshotTransaction are still passing, even though direct gets should now go through the snapshot reader and not directly through the TKVS.

@jkozlowski, @rhuffy who'll likely be using this: I think this lets you do what we need to do (read multiple snapshots from different databases, and then do something with it), but is this enough? Given this is going to be unpleasant, do we have alternatives? @LucasIME who may be involved at least in initial reviews: Does this abstraction make sense for atlasdb? I believe this kind of change is positive for the system overall, but what do you think? Does the plan make sense for the path type tracker? @sverma30 for SA

jeremyk-91 avatar Feb 27 '24 21:02 jeremyk-91

Generate changelog in changelog/@unreleased

What do the change types mean?
  • feature: A new feature of the service.
  • improvement: An incremental improvement in the functionality or operation of the service.
  • fix: Remedies the incorrect behaviour of a component of the service in a backwards-compatible way.
  • break: Has the potential to break consumers of this service's API, inclusive of both Palantir services and external consumers of the service's API (e.g. customer-written software or integrations).
  • deprecation: Advertises the intention to remove service functionality without any change to the operation of the service itself.
  • manualTask: Requires the possibility of manual intervention (running a script, eyeballing configuration, performing database surgery, ...) at the time of upgrade for it to succeed.
  • migration: A fully automatic upgrade migration task with no engineer input required.

Note: only one type should be chosen.

How are new versions calculated?
  • ❗The break and manual task changelog types will result in a major release!
  • 🐛 The fix changelog type will result in a minor release in most cases, and a patch release version for patch branches. This behaviour is configurable in autorelease.
  • ✨ All others will result in a minor version release.

Type

  • [ ] Feature
  • [ ] Improvement
  • [ ] Fix
  • [ ] Break
  • [ ] Deprecation
  • [ ] Manual task
  • [ ] Migration

Description [WIP] [DNM] SnapshotReader Prototype

Check the box to generate changelog(s)

  • [ ] Generate changelog entry

changelog-app[bot] avatar Feb 27 '24 21:02 changelog-app[bot]

Before I start reviewing, just answering your questions

A rough idea of what this will look like. It's still a work in progress. I think it'll be its own chain when it's time to merge this down - we should discuss if this still makes sense. Basically for simple gets this is already fairly involved, and for range scans it's going to be probably pretty hard and big. I won't be surprised if this reaches a delta of 4000 or 5000 in its raw form.

Yeah, this is a large change.

I think this undertaking makes sense if we're blocked without it, but only if that's the case.

I think we discussed it a few times now and I can't find a way around it.

The code is a bit of a cluster because of the PathTypeTracker work that tries to enforce a hard separation between sync > and async gets, I believe because no KVS actually implements async get properly (!!). For now I'll operate with keeping this, but I'm not convinced this actually needs to be kept. Actual code will be larger in terms of adding code, because we need to introduce a few abstractions. 27 Feb

So a few thoughts: I think this async stuff is annoying right now because 1. The tests are weird. 2. I think we were a bit scared of making those changes, so we kept 2 paths still. I would say all we need to do is REMOVE the sync paths at the TransactionKeyValueService, force KeyValueService to have an async implementation that just blocks (instead of doing this in SnapshotTransaction.

That said, I don't know if you want to block on this, and just remove it and re-do this work when you're ready. It's probably less work right now to just push it through vs unwinding, but maybe just try and see what the blast radius would be?

In this first version (Feb 27) I have only moved get/getAsync() so a bunch of things still live in two places right now. Once > we move the range scans etc. out there should be more deletions / the net change should not be as large. The abstract >tests for SnapshotTransaction are still passing, even though direct gets should now go through the snapshot reader and > not directly through the TKVS.

I think I'm in sync here, at it's core this is literally a code move/reshuffling.

jkozlowski avatar Feb 28 '24 09:02 jkozlowski