spicedb
spicedb copied to clipboard
Proposal: Add relationship count API
See below for the current proposal
Problem Statement
We are migrating our data from a legacy authorization system into SpiceDB and would like a way to easily verify the relationship counts in order to validate the accuracy of our data. Specifically, we would like to be able to do this in the following scenarios:
- After hydrating data to verify counts match our legacy system
- Periodically (e.g. daily) to ensure we our real-time-update logic from our legacy system to our SpiceDB instance are in sync, allowing us to investigate further if this is not the case
We can currently solve scenario 1 above by subscribing the Watch API while performing a hydration and maintaining the counts internally. For scenario 2 we can do a filtered ReadRelationships call for each relationship, but this requires retrieving more data than necessary from the database since we only care about the counts in the end. The ReadRelationships API also has the limitation of requiring us to make the calls in chunks of 1000, which forces us to process things serially.
In both cases having a relationships count API that simply performs a SELECT count(*) from relation_tuple WHERE ... clause against the database would be much more efficient/user friendly.
Solution Brainstorm
Implement a CountRelationships API with the following spec:
rpc CountRelationships(CountRelationshipsRequest)
returns (stream CountRelationshipsResponse) {
option (google.api.http) = {
post: "/v1/relationships/count"
body: "*"
};
}
message CountRelationshipsRequest {
Consistency consistency = 1;
// relationship_filter defines the filter to be applied to the relationships
// to be counted.
RelationshipFilter relationship_filter = 2
[ (validate.rules).message.required = true ];
}
message CountRelationshipsResponse {
// read_at is the ZedToken at which the relationship count was performed
ZedToken read_at = 1 [ (validate.rules).message.required = true ];
uint64 relationship_count = 2;
}
My concerns with this idea are two-fold:
count(*)operations can be extremely heavy on databases, especially in CRDB where they are essentially linear. I believe this would be somewhat mitigated by the query only hitting the index(es) we already have, but I'm not convinced it wouldn't become a large issue, especially if the API is invoked multiple times a minute (or worst, many times per second)- What is the intention if the found count differs from that expected? At that point, there is some sort of data inconsistency, but unlike the watch solution, there is no means to determine exactly what is missing/broken/wrong. Would the intent then be to delete all matching relationships and re-hydrate?
If the intention is to only have this used during the initial hydration and subsequently for a small window, I believe having a service which consumes the Watch API and keeps a real-time count (bucketed by whatever filters are useful) might be a better solution: it could be an open source solution as well, configurable from the command line to take in filters, keep a real-time count and, using our ability to go backwards in time with Watch, be able to "pick up" if the watcher goes down
Thanks for the response @josephschorr . I'll do my best to answer your questions:
count(*) operations can be extremely heavy on databases, especially in CRDB where they are https://github.com/cockroachdb/cockroach/issues/7825. I believe this would be somewhat mitigated by the query only hitting the index(es) we already have, but I'm not convinced it wouldn't become a large issue, especially if the API is invoked multiple times a minute (or worst, many times per second)
This is a legitimate concern, but I do think the existing indexes will help here quite a bit. I ran multiple different count(*) queries that could be formulated by different RelationshipFilters representing the types of queries which theoretically be provided to this API:
-
RelationshipFilter:
resource_type: user> explain select count(*) from relation_tuple where namespace='user'; info ------------------------------------------------------------------------------------------ distribution: full vectorized: true • group (scalar) │ estimated row count: 1 │ └── • scan estimated row count: 1,495,925 (0.75% of the table; stats collected 21 days ago) table: relation_tuple@pk_relation_tuple spans: [/'user' - /'user'] (10 rows) > select count(*) from relation_tuple where namespace='user'; count ----------- 1371089 (1 row) Time: 757ms total (execution 708ms / network 49ms) -
RelationshipFilter:
resource_type: user, optional_resource_id: abc> explain select count(*) from relation_tuple where namespace='user' AND object_id='abc'; info -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- distribution: full vectorized: true • group (scalar) │ estimated row count: 1 │ └── • scan estimated row count: 0 (<0.01% of the table; stats collected 21 days ago) table: relation_tuple@pk_relation_tuple spans: [/'user'/'abc' - /'user'/'abc'] (10 rows) Time: 32ms total (execution 1ms / network 31ms) > select count(*) from relation_tuple where namespace='user' AND object_id='a'; count --------- 2 (1 row) Time: 35ms total (execution 3ms / network 32ms) -
RelationshipFilter:
resource_type: user, optional_resource_id_prefix: a> explain select count(*) from relation_tuple where namespace='user' AND object_id LIKE 'a%'; info -------------------------------------------------------------------------------------------------------------------------- distribution: full vectorized: true • group (scalar) │ estimated row count: 1 │ └── • scan estimated row count: 17,790 (<0.01% of the table; stats collected 21 days ago) table: relation_tuple@pk_relation_tuple spans: [/'user'/'a' - /'user'/'b') (10 rows) Time: 35ms total (execution 1ms / network 34ms) > select count(*) from relation_tuple where namespace='user' AND object_id LIKE 'a%'; count --------- 380 (1 row) Time: 35ms total (execution 4ms / network 31ms) -
RelationshipFilter:
resource_type: user, optional_resource_id_prefix: a, relation: manager> explain select count(*) from relation_tuple where namespace='user' AND object_id LIKE 'a%' AND relation='manager'; info -------------------------------------------------------------------------------------------- distribution: full vectorized: true • group (scalar) │ estimated row count: 1 │ └── • filter │ estimated row count: 0 │ filter: relation = 'manager' │ └── • scan estimated row count: 206,272 (0.10% of the table; stats collected 21 days ago) table: relation_tuple@pk_relation_tuple spans: [/'user'/'a'/'manager' - /'user'/'b') index recommendations: 1 1. type: index creation SQL command: CREATE INDEX ON "spicedb-production".public.relation_tuple (relation); (18 rows) Time: 38ms total (execution 6ms / network 32ms) > select count(*) from relation_tuple where namespace='user' AND object_id LIKE 'a%' AND relation='manager'; count --------- 106 (1 row) Time: 56ms total (execution 3ms / network 53ms) -
RelationshipFilter:
resource_type: user, optional_resource_id_prefix: a, relation: manager, subject_type: user> explain select count(*) from relation_tuple where namespace='user' AND object_id LIKE 'a%' AND relation='manager' AND userset_namespace='user'; info --------------------------------------------------------------------------------------------------------------- distribution: full vectorized: true • group (scalar) │ estimated row count: 1 │ └── • filter │ estimated row count: 3 │ filter: (relation = 'manager') AND (userset_namespace = 'user') │ └── • scan estimated row count: 1,353,438 (0.68% of the table; stats collected 21 days ago) table: relation_tuple@pk_relation_tuple spans: [/'user'/'a'/'manager'/'user' - /'user'/'b') index recommendations: 1 1. type: index creation SQL command: CREATE INDEX ON "spicedb-production".public.relation_tuple (namespace, relation, userset_namespace); (18 rows) Time: 47ms total (execution 6ms / network 41ms) > select count(*) from relation_tuple where namespace='user' AND object_id LIKE 'a%' AND relation='manager' AND userset_namespace='user'; count --------- 1850 (1 row) Time: 46ms total (execution 15ms / network 31ms)
As a note there are a total 202,626,032 of relations in this CockroachDB database. While I recognize the frequency of different values may lead to varying performance, overall the indexes already implemented allowed the search space to be vastly reduced.
Furthermore, it would be acceptable in my mind to execute these queries with AS OF SYSTEM TIME follower_read_timestamp(); ensuring the request is made against any available replica. Because this is more of a "stats" API I personally don't think exact-up-to-date numbers are strictly required.
If this really does become a problem where people are abusing this API I think adding some form of rate limiting is reasonable and is probably something that should be considered for APIs in general at some point in time.
What is the intention if the found count differs from that expected? At that point, there is some sort of data inconsistency, but unlike the watch solution, there is no means to determine exactly what is missing/broken/wrong. Would the intent then be to delete all matching relationships and re-hydrate?
It depends on the scenario. If it is a systemic issue then a bulk export on a subset of the data is likely required to investigate the root cause. If it is a one-off issue, then dehydration/rehydration of the data is likely the simplest fix.
So it occurred to us that the https://github.com/authzed/spicedb/issues/1785 issue might be an avenue to solve this problem:
- We add a registration API called
TrackRelationshipCount(andUntrackRelationshipCount) that takes in a relationships filter - The
CountRelationshipsAPI can only be called once the tracking/registration API has been invoked. - We start with the CountRelationships API temporarily invoking
count(*), but with registration still required. - We build the stats service (https://github.com/authzed/spicedb/issues/1785) to provide not only internal stats stored using HLLs but also the real counts for any registered filters using the above API. Once the stats service is ready, we transition the
CountRelationshipsAPI from using the temporary call-count-* to using the stats API
Codifying this into a proposal:
CountRelationships Proposal
Add the ability to track and retrieve the count of relationships based on registered filters in SpiceDB. This will be used both by external users for data validation purposes and, eventually, by SpiceDB itself for use in the eventual query planner (see https://github.com/authzed/spicedb/issues/1785 and https://github.com/authzed/spicedb/issues/1573)
Proposal
SpiceDB will add the following APIs:
// RegisterRelationshipCounter registers a new filter for counting relationships. A filter must be registered before
// a count can be requested.
rpc RegisterRelationshipCounter(RegisterRelationshipCounterRequest) returns (RegisterRelationshipCounterResponse)
// CountRelationships returns the count of relationships for *pre-registered* filter.
rpc CountRelationships(CountRelationshipsRequest) returns (CountRelationshipsResponse)
// UnregisterRelationshipCounter unregisters a new filter for counting relationships.
rpc UnregisterRelationshipCounter(UnregisterRelationshipCounterRequest) returns (UnregisterRelationshipCounterResponse)
message RegisterRelationshipCounterRequest {
// relationship_filter defines the filter to be applied to the relationships
// to be counted.
RelationshipFilter relationship_filter = 1
[ (validate.rules).message.required = true ];
}
message RegisterRelationshipCounterResponse {}
message CountRelationshipsRequest {
Consistency consistency = 1;
// relationship_filter defines the filter to be applied to the relationships
// to be counted.
RelationshipFilter relationship_filter = 2
[ (validate.rules).message.required = true ];
}
message CountRelationshipsResponse {
// read_at is the ZedToken at which the relationship count was performed
ZedToken read_at = 1 [ (validate.rules).message.required = true ];
uint64 relationship_count = 2;
}
message UnregisterRelationshipCounterRequest {
RelationshipFilter relationship_filter = 1
[ (validate.rules).message.required = true ];
}
message UnregisterRelationshipCounterResponse {}
Workflow and Implementation
The workflow with consist of registering one (or more) relationship filters, which will be stored in the datastore.
For the first-pass, naive implementation, the CountRelationshipsRequest will issue a count(*) with the filter to the underlying datastore.
For the second-pass, proper implementation, a distinct service will be added to SpiceDB to use the watch API to keep the counts up-to-date in-memory, with CountRelationshipsRequest dispatching to the service, and the service storing the counts on an interval (5m? 10m? 30m?) in the backing datastore alongside the last revision for which it was updated
a distinct service will be added to SpiceDB to use the watch API to keep the counts up-to-date in-memory
Instead of a distinct service, we could use the DB to coordinate a leader election among SpiceDB replicas and choose one to hold the watch / do the counts.
Thanks for putting together this official proposal @josephschorr ! Do you anticipate having any kind of guardrails in terms of the number of relationship counters that can be registered at a given time.
Also, the RelationshipFilter allows the user to specify an optional_resource_id_prefix. While this will be very useful, I could see this causing a fair amount of load on the backing datastore when performing a count. Do you have any concerns with that? To be clear: for our use case we'd want to have this optional_resource_id_prefix and are willing to accept the load it may have on the database for the initial implementation.
Do you anticipate having any kind of guardrails in terms of the number of relationship counters that can be registered at a given time.
Yes, there will likely be a (configurable) limit
Do you have any concerns with that?
Yes, but fortunately it should be an index scan in most cases and once there is a real counting service, it will be handled in memory
Closing since this is now merged. A followup with a service for counting will be added if need be. Fixes https://github.com/authzed/spicedb/pull/1901