spicedb
spicedb copied to clipboard
Add API for comparing ZedTokens
We have some use cases involving event streams, where the events would include ZedTokens from SpiceDB (e.g. the results of the watch
API). Consumers of these events will call SpiceDB to look up additional information in response, providing event's ZedToken to get data at_least_as_fresh
as the event, and persist the results somewhere (e.g. a database). Ideally, event consumers would be able to compare the ZedToken from an async event against a ZedToken already stored in the DB, determine which one is "newer", and call SpiceDB providing the most recent ZedToken.
To support this use case, we would need a client library or gRPC endpoint for comparing ZT's. We recognize that some datastores allow for concurrent updates, so it may not be possible to conclusively say on ZedTokens is "before" another. We could work around a case of "concurrent" ZedTokens by simply making a fully_consistent
request to SpiceDB. So if we had a function like compare(ZT1, ZT2)
it could return (for example):
-
DEFINITELY_BEFORE
: in which case we call SpiceDB withat_least_as_fresh(ZT2)
-
DEFINITELY_AFTER
: in which case we call SpiceDB withat_least_as_fresh(ZT1)
-
INCONCLUSIVE
/CONCURRENT
: in which case we call SpiceDB withfully_consistent
Alternatively/additionally, the existing Consistency parameter for SpiceDB operations could be modified to allow passing in a list of ZedTokens for at_least_as_fresh
(so the operation would be performed on data at least as fresh as the "newest" of all the provided tokens) to avoid an extra roundtrip for comparison.
Note: this capability for comparing ZedTokens is mentioned in footnote 3 of the Tiger Cache Proposal #207
Alternatively/additionally, the existing Consistency parameter for SpiceDB operations could be modified to allow passing in a list of ZedTokens for at_least_as_fresh (so the operation would be performed on data at least as fresh as the "newest" of all the provided tokens) to avoid an extra roundtrip for comparison.
@jason-who-codes would you prefer this approach vs a comparison API? Are there any other areas where a comparison API would make sense/provide value?
Yep - passing in a list of tokens for at_least_as_fresh
would generally be preferable, as it eliminates a service call round-trip. It also prevents us from needing to decide to make a fully_consistent
request if the tokens are concurrent (I suspect that SpiceDB internally could do something "smarter" to guarantee at_least_as_fresh
as both tokens without resorting to full consistency)
This would be beneficial to us as well. We have a large model and quite a few relationships, so uncached performance can be a bit rough. We've extended the quantization window to 24 hours and rely heavily on at_least_as_fresh
to ensure consistency.
We've implemented a middleware layer that stores consistency tokens for various objects when relationships are written and then query with the most recent.
To implement this, we've done a bit of a naughty by un-opaquing the ZedToken. With CockroachDB and memdb, it's just a base64 encoded integer timestamp. However, I'd love to be able to pass multiple tokens and let SpiceDB take care of it.
@croemmich can you expand on why, exactly, your middleware layer needs to compare ZedTokens at all? If you are storing a ZedToken for an updated object, then at least as fresh should "just work" when sent that ZedToken
We've implemented a middleware layer that stores consistency tokens for various objects when relationships are written and then query with the most recent. To implement this, we've done a bit of a naughty by un-opaquing the ZedToken. With CockroachDB and memdb, it's just a base64 encoded integer timestamp. However, I'd love to be able to pass multiple tokens and let SpiceDB take care of it.
Same here. We are using MySQL, where we generate code for the (internal) DecodedZedToken
(source) to make sure to properly parse it.
can you expand on why, exactly, your middleware layer needs to compare ZedTokens at all?
After reading the documentation, the original Zookie paper, this blog post and asking for clarification here our understanding is that there is no guarantee for "reading our own writes" in case that involves traversing a hierarchy of objects.
E.g. take the following example: Schema:
- User, Organization, Document
- Organization has members and documents
T0: Alice(T0) is member of organization O1(T0) T1: Alice adds document D1(T1) T2: Alice adds Bob as member of organization O1(T2), Bob(T2)
When reading document D1 now, we would use ZedToken T1, which could lead to us not seeing Bob being a member of O1. To avoid this, we want to make sure that we are always using the most recent ZedToken - for which we either need to compare it locally (quickly) - or can pass a list of tokens.
@josephschorr Honestly, the local comparison of the integer would be great. What exactly is the reason this is not part of the API ? My current understanding is that global ordering should be possible as long as we use a common persistence layer for all SpiceDB instances and use that to source the integer in the first place (which seems to be the case for postrgres and mysql at least). Happy to learn more, though! :pray:
Also, if there are case where this is not doable, this could be signaled with a flag ala comparable bool
? :thinking:
I think we agree on the need to either compare zedtokens or have SpiceDB accept multiple and have it pick the most recent. Each datastore may have a different underlying of zedtokens so it's just not a timestamp - this is the case of postgres implementation which uses PG internal datatypes. Exposing those internals via a client library would turn it into API and make it not possible to evolve the underlying datastore implementation without breaking clients. For example PG implementation was also a timestamp before it started using PG's MVCC xid
, xmin
and xmax
types.
Would having the APIs accepting multiple zedtokens so that SpiceDB picks up the most recent satisfy your requirements?
For example PG implementation was also a timestamp before it started using PG's MVCC xid, xmin and xmax types.
Ok, thanks for the explanation! Missed that.
Would having the APIs accepting multiple zedtokens so that SpiceDB picks up the most recent satisfy your requirements?
Yes, that would work. :+1:
We recently ran into a similar issue with what this Issue is hoping to address with our GraphQL API. GraphQL resolvers are inherently asynchronous, and multiple mutations/queries can be made in the same "request". To solve the issue of a single GraphQL request creating multiple asynchronous SpiceDB writes, and then resolving the underlying GraphQL query (within a mutation, lets say) that could end up hitting SpiceDB for a permission - we were also trying to issue subsequent requests with the latest token across all of the async writes.
If we could instead pass a list of ZedToken
s and have SpiceDB determine use the latest as the consistency value, that would address our issue of not being able to decode/order tokens on the client side.
This is not blocking us - instead we are issuing fullyConsistent
requests whenever any write has occurred within the lifecycle described above, but it would be convenient to use.
Adding some context as to cases where making ZedTokens a repeated field might not be enough.
When consuming changes from the Watch API I get a ZedToken for when the change happened (let's call it ZT1
). This might prompt me to recompute some expanded permissions using LookupResources
or LookupSubjects
, the result of which also gives me a ZedToken (ZT2
). If I then consume another change from the Watch API with its own ZedToken (ZT3
) and it affects the same resource or subject as the previous change, I might want to know whether ZT2
is fresher than ZT3
, and if it is I can bypass making another call to LookupResources
as it can be quite expensive.
Making the field repeated means I would still have to make the expensive call, and would only guarantee better freshness of the results which is not really what I'm after here.