nats-server
nats-server copied to clipboard
Multi key and bucket consistent reads in KV
What motivated this proposal?
When storing multiple related messages in kv bucket(s) for later reading as a group there is no efficient way to read the bucket(s) as they were at some point in time.
Scenario:
Multiple keys represent one related state, keys can be independantly updated as we do not have transactions and multiple writers can write to the bucket.
For consistency you want to get multiple keys as they were at one specific point in time, as if in a read transaction.
This can be used for primitives like consistent point in time view of a map or slice.
This is not possible today, but a few additions to direct get can make this work.
What is the proposed change?
Add to the structure here:
https://github.com/nats-io/nats-server/blob/5a497272c3725bea8fe8cf1f9d959c4d9a95b450/server/jetstream_api.go#L620-L624
upto_ts- gets the latest value requested up to and including the supplied timestampupto_seq- gets the latest value requested up to and including the supplied seq
To solve the series of independant reads we would add some new APIs, something like:
func(k1, k2, k3)- gets latest k1, ask for k2 and k3 that isupto_seqk1 seqfunc(ts, k1, k2, k3)- gets k1 usingupto_tsts and then get k2 and k3 usingupto_seqof k1
This would of course require sufficient history to be available
Who benefits from this change?
Transactional operations are a frequently requested feature, write transactions would be very hard for us but this method would allow us to do a reasonable fascimile of read transactions
What alternatives have you evaluated?
We could provide transactions so one could make a read transaction and then all kv reads would be consistent but this is far in the future and would be per bucket.
This would be a huge bonus to me.
i store operational things on nats kv and Object Store such as wasm, json Schema, config
then higher level systems use them at runtime .
so having a versioning concept would be so cool .
Essentially I use nats as a registry and store.
Its coming, maybe even by end of this weekend as an add-on to batch direct gets. @ripienaar is just finalizing the spec for us.
Its coming, maybe even by end of this weekend as an add-on to batch direct gets. @ripienaar is just finalizing the spec for us.
Thanks for the heads up !!
Can we do what we need with one subject?
Also this is implied LastFor correct? Combined with the UpTo semantics..
Can we do what we need with one subject?
You mean one API subject? Same as now yeah if thats what you mean
Also this is implied LastFor correct? Combined with the UpTo semantics..
No, does not imply LastFor I think. It's to allow reading a group of related subjects as they were at a point in time even if some of those subjects got messages since (assuming we have history that goes far enoug, if we dont have old enough for one subject it would not be included in results)
Can we do what we need with one subject?
You mean one API subject? Same as now yeah if thats what you mean
No I meant we need []string for subjects. E.g. get me "foo", "bar", "baz" up to sequence 100.
Also this is implied LastFor correct? Combined with the UpTo semantics..
No, does not imply LastFor I think. It's to allow reading a group of related subjects as they were at a point in time even if some of those subjects got messages since (assuming we have history that goes far enoug, if we dont have old enough for one subject it would not be included in results)
LastFor with qualification of up to sequence..
I think we need a different request shape / type and new API endpoint. Do not think we can overload direct get as is.
Wow would be a real shame to have to add a subject - probably right though.
Alternative seems to be a “multi” key here with a object in it describing a multi get?
I could see if I could squeeze it into the direct get, would need to add []string under a different name and the upto stuff. Will think on it a bit more.
Wildcard imo also fine. As ever wild cards is how we optimise and organise related data.
If they really want point in time for many unrelated subjects it could be multiple direct gets with the same relative seq or ts.
You really think separate gets would be ok vs []string?
If we can have multiple without it being awkward then that would be better.
I don't want to bloat the scope, but this batch request would be ideal for consumer-less fetching/pagination of messages (e.g. akin to nats str view). However, we would need an asof_ts option (equiv to a consumer start time) in order to identify the first message and sequence to then paginate forward from.
Agree would solve a bunch of problems
@bruth the sequence is starting sequence, and after the batch you know the starting sequence for the next one. We could do by time, as we will have some of that logic for multi-key get as well.