nats-server icon indicating copy to clipboard operation
nats-server copied to clipboard

Multi key and bucket consistent reads in KV

Open ripienaar opened this issue 2 years ago • 17 comments
trafficstars

What motivated this proposal?

When storing multiple related messages in kv bucket(s) for later reading as a group there is no efficient way to read the bucket(s) as they were at some point in time.

Scenario:

Multiple keys represent one related state, keys can be independantly updated as we do not have transactions and multiple writers can write to the bucket.

For consistency you want to get multiple keys as they were at one specific point in time, as if in a read transaction.

This can be used for primitives like consistent point in time view of a map or slice.

This is not possible today, but a few additions to direct get can make this work.

What is the proposed change?

Add to the structure here:

https://github.com/nats-io/nats-server/blob/5a497272c3725bea8fe8cf1f9d959c4d9a95b450/server/jetstream_api.go#L620-L624

  • upto_ts - gets the latest value requested up to and including the supplied timestamp
  • upto_seq - gets the latest value requested up to and including the supplied seq

To solve the series of independant reads we would add some new APIs, something like:

  • func(k1, k2, k3) - gets latest k1, ask for k2 and k3 that is upto_seq k1 seq
  • func(ts, k1, k2, k3) - gets k1 using upto_ts ts and then get k2 and k3 using upto_seq of k1

This would of course require sufficient history to be available

Who benefits from this change?

Transactional operations are a frequently requested feature, write transactions would be very hard for us but this method would allow us to do a reasonable fascimile of read transactions

What alternatives have you evaluated?

We could provide transactions so one could make a read transaction and then all kv reads would be consistent but this is far in the future and would be per bucket.

ripienaar avatar Aug 25 '23 08:08 ripienaar

This would be a huge bonus to me.

i store operational things on nats kv and Object Store such as wasm, json Schema, config

then higher level systems use them at runtime .

so having a versioning concept would be so cool .

Essentially I use nats as a registry and store.

gedw99 avatar Aug 29 '23 12:08 gedw99

Its coming, maybe even by end of this weekend as an add-on to batch direct gets. @ripienaar is just finalizing the spec for us.

derekcollison avatar Jan 20 '24 17:01 derekcollison

Its coming, maybe even by end of this weekend as an add-on to batch direct gets. @ripienaar is just finalizing the spec for us.

Thanks for the heads up !!

gedw99 avatar Jan 20 '24 17:01 gedw99

Can we do what we need with one subject?

derekcollison avatar Jan 21 '24 03:01 derekcollison

Also this is implied LastFor correct? Combined with the UpTo semantics..

derekcollison avatar Jan 21 '24 05:01 derekcollison

Can we do what we need with one subject?

You mean one API subject? Same as now yeah if thats what you mean

ripienaar avatar Jan 21 '24 08:01 ripienaar

Also this is implied LastFor correct? Combined with the UpTo semantics..

No, does not imply LastFor I think. It's to allow reading a group of related subjects as they were at a point in time even if some of those subjects got messages since (assuming we have history that goes far enoug, if we dont have old enough for one subject it would not be included in results)

ripienaar avatar Jan 21 '24 08:01 ripienaar

Can we do what we need with one subject?

You mean one API subject? Same as now yeah if thats what you mean

No I meant we need []string for subjects. E.g. get me "foo", "bar", "baz" up to sequence 100.

derekcollison avatar Jan 21 '24 14:01 derekcollison

Also this is implied LastFor correct? Combined with the UpTo semantics..

No, does not imply LastFor I think. It's to allow reading a group of related subjects as they were at a point in time even if some of those subjects got messages since (assuming we have history that goes far enoug, if we dont have old enough for one subject it would not be included in results)

LastFor with qualification of up to sequence..

I think we need a different request shape / type and new API endpoint. Do not think we can overload direct get as is.

derekcollison avatar Jan 21 '24 14:01 derekcollison

Wow would be a real shame to have to add a subject - probably right though.

Alternative seems to be a “multi” key here with a object in it describing a multi get?

ripienaar avatar Jan 21 '24 14:01 ripienaar

I could see if I could squeeze it into the direct get, would need to add []string under a different name and the upto stuff. Will think on it a bit more.

derekcollison avatar Jan 21 '24 14:01 derekcollison

Wildcard imo also fine. As ever wild cards is how we optimise and organise related data.

If they really want point in time for many unrelated subjects it could be multiple direct gets with the same relative seq or ts.

ripienaar avatar Jan 21 '24 14:01 ripienaar

You really think separate gets would be ok vs []string?

derekcollison avatar Jan 21 '24 15:01 derekcollison

If we can have multiple without it being awkward then that would be better.

ripienaar avatar Jan 21 '24 15:01 ripienaar

I don't want to bloat the scope, but this batch request would be ideal for consumer-less fetching/pagination of messages (e.g. akin to nats str view). However, we would need an asof_ts option (equiv to a consumer start time) in order to identify the first message and sequence to then paginate forward from.

bruth avatar Jan 31 '24 12:01 bruth

Agree would solve a bunch of problems

ripienaar avatar Jan 31 '24 12:01 ripienaar

@bruth the sequence is starting sequence, and after the batch you know the starting sequence for the next one. We could do by time, as we will have some of that logic for multi-key get as well.

derekcollison avatar Jan 31 '24 15:01 derekcollison