spicedb icon indicating copy to clipboard operation
spicedb copied to clipboard

Add API for comparing ZedTokens

Open jason-who-codes opened this issue 2 years ago • 9 comments

We have some use cases involving event streams, where the events would include ZedTokens from SpiceDB (e.g. the results of the watch API). Consumers of these events will call SpiceDB to look up additional information in response, providing event's ZedToken to get data at_least_as_fresh as the event, and persist the results somewhere (e.g. a database). Ideally, event consumers would be able to compare the ZedToken from an async event against a ZedToken already stored in the DB, determine which one is "newer", and call SpiceDB providing the most recent ZedToken.

To support this use case, we would need a client library or gRPC endpoint for comparing ZT's. We recognize that some datastores allow for concurrent updates, so it may not be possible to conclusively say on ZedTokens is "before" another. We could work around a case of "concurrent" ZedTokens by simply making a fully_consistent request to SpiceDB. So if we had a function like compare(ZT1, ZT2) it could return (for example):

  • DEFINITELY_BEFORE: in which case we call SpiceDB with at_least_as_fresh(ZT2)
  • DEFINITELY_AFTER: in which case we call SpiceDB with at_least_as_fresh(ZT1)
  • INCONCLUSIVE/CONCURRENT: in which case we call SpiceDB with fully_consistent

Alternatively/additionally, the existing Consistency parameter for SpiceDB operations could be modified to allow passing in a list of ZedTokens for at_least_as_fresh (so the operation would be performed on data at least as fresh as the "newest" of all the provided tokens) to avoid an extra roundtrip for comparison.

Note: this capability for comparing ZedTokens is mentioned in footnote 3 of the Tiger Cache Proposal #207

jason-who-codes avatar Feb 10 '23 22:02 jason-who-codes

Alternatively/additionally, the existing Consistency parameter for SpiceDB operations could be modified to allow passing in a list of ZedTokens for at_least_as_fresh (so the operation would be performed on data at least as fresh as the "newest" of all the provided tokens) to avoid an extra roundtrip for comparison.

@jason-who-codes would you prefer this approach vs a comparison API? Are there any other areas where a comparison API would make sense/provide value?

josephschorr avatar May 11 '23 20:05 josephschorr

Yep - passing in a list of tokens for at_least_as_fresh would generally be preferable, as it eliminates a service call round-trip. It also prevents us from needing to decide to make a fully_consistent request if the tokens are concurrent (I suspect that SpiceDB internally could do something "smarter" to guarantee at_least_as_fresh as both tokens without resorting to full consistency)

jason-who-codes avatar May 12 '23 18:05 jason-who-codes

This would be beneficial to us as well. We have a large model and quite a few relationships, so uncached performance can be a bit rough. We've extended the quantization window to 24 hours and rely heavily on at_least_as_fresh to ensure consistency.

We've implemented a middleware layer that stores consistency tokens for various objects when relationships are written and then query with the most recent.

To implement this, we've done a bit of a naughty by un-opaquing the ZedToken. With CockroachDB and memdb, it's just a base64 encoded integer timestamp. However, I'd love to be able to pass multiple tokens and let SpiceDB take care of it.

croemmich avatar Aug 01 '23 04:08 croemmich

@croemmich can you expand on why, exactly, your middleware layer needs to compare ZedTokens at all? If you are storing a ZedToken for an updated object, then at least as fresh should "just work" when sent that ZedToken

josephschorr avatar Aug 01 '23 14:08 josephschorr

We've implemented a middleware layer that stores consistency tokens for various objects when relationships are written and then query with the most recent. To implement this, we've done a bit of a naughty by un-opaquing the ZedToken. With CockroachDB and memdb, it's just a base64 encoded integer timestamp. However, I'd love to be able to pass multiple tokens and let SpiceDB take care of it.

Same here. We are using MySQL, where we generate code for the (internal) DecodedZedToken (source) to make sure to properly parse it.

can you expand on why, exactly, your middleware layer needs to compare ZedTokens at all?

After reading the documentation, the original Zookie paper, this blog post and asking for clarification here our understanding is that there is no guarantee for "reading our own writes" in case that involves traversing a hierarchy of objects.

E.g. take the following example: Schema:

  • User, Organization, Document
  • Organization has members and documents

T0: Alice(T0) is member of organization O1(T0) T1: Alice adds document D1(T1) T2: Alice adds Bob as member of organization O1(T2), Bob(T2)

When reading document D1 now, we would use ZedToken T1, which could lead to us not seeing Bob being a member of O1. To avoid this, we want to make sure that we are always using the most recent ZedToken - for which we either need to compare it locally (quickly) - or can pass a list of tokens.

@josephschorr Honestly, the local comparison of the integer would be great. What exactly is the reason this is not part of the API ? My current understanding is that global ordering should be possible as long as we use a common persistence layer for all SpiceDB instances and use that to source the integer in the first place (which seems to be the case for postrgres and mysql at least). Happy to learn more, though! :pray: Also, if there are case where this is not doable, this could be signaled with a flag ala comparable bool? :thinking:

geropl avatar Oct 11 '23 07:10 geropl

I think we agree on the need to either compare zedtokens or have SpiceDB accept multiple and have it pick the most recent. Each datastore may have a different underlying of zedtokens so it's just not a timestamp - this is the case of postgres implementation which uses PG internal datatypes. Exposing those internals via a client library would turn it into API and make it not possible to evolve the underlying datastore implementation without breaking clients. For example PG implementation was also a timestamp before it started using PG's MVCC xid, xmin and xmax types.

Would having the APIs accepting multiple zedtokens so that SpiceDB picks up the most recent satisfy your requirements?

vroldanbet avatar Oct 11 '23 09:10 vroldanbet

For example PG implementation was also a timestamp before it started using PG's MVCC xid, xmin and xmax types.

Ok, thanks for the explanation! Missed that.

Would having the APIs accepting multiple zedtokens so that SpiceDB picks up the most recent satisfy your requirements?

Yes, that would work. :+1:

geropl avatar Oct 11 '23 11:10 geropl

We recently ran into a similar issue with what this Issue is hoping to address with our GraphQL API. GraphQL resolvers are inherently asynchronous, and multiple mutations/queries can be made in the same "request". To solve the issue of a single GraphQL request creating multiple asynchronous SpiceDB writes, and then resolving the underlying GraphQL query (within a mutation, lets say) that could end up hitting SpiceDB for a permission - we were also trying to issue subsequent requests with the latest token across all of the async writes.

If we could instead pass a list of ZedTokens and have SpiceDB determine use the latest as the consistency value, that would address our issue of not being able to decode/order tokens on the client side.

This is not blocking us - instead we are issuing fullyConsistent requests whenever any write has occurred within the lifecycle described above, but it would be convenient to use.

mgagliardo91 avatar Jan 03 '24 21:01 mgagliardo91

Adding some context as to cases where making ZedTokens a repeated field might not be enough.

When consuming changes from the Watch API I get a ZedToken for when the change happened (let's call it ZT1). This might prompt me to recompute some expanded permissions using LookupResources or LookupSubjects, the result of which also gives me a ZedToken (ZT2). If I then consume another change from the Watch API with its own ZedToken (ZT3) and it affects the same resource or subject as the previous change, I might want to know whether ZT2 is fresher than ZT3, and if it is I can bypass making another call to LookupResources as it can be quite expensive.

Making the field repeated means I would still have to make the expensive call, and would only guarantee better freshness of the results which is not really what I'm after here.

benvernier-sc avatar Mar 31 '24 22:03 benvernier-sc