spicedb
spicedb copied to clipboard
ReadBulkRelationships API
Problem Statement
I have many situations where I want to read many relationships, but ReadRelationships restricts me to reading them one at a time. Most prohibitively, this happens when I want to read a relationship for every item in a list response.
Consider this schema:
definition folder {}
definition document {
relation parent: folder
}
Let's say my API wants to send a user a specific list of documents, with the namespace (in this case, just the name of the parent folder) included:
[
{
"id": "DocumentA",
"namespace": "FolderA",
},
{
"id": "DocumentB",
"namespace": "FolderB",
},
{
"id": "DocumentC",
"namespace": "FolderC",
}
]
If I have a length n list of documents alone, such as ["document:DocumentA", "document:DocumentB", "document:DocumentC"], I have to make n SpiceDB ReadRelationships requests to fill out those namespaces. Each request would be of this form:
ReadRelationships(
ReadRelationshipsRequest(
consistency=Consistency(...),
relationship_filter=RelationshipFilter(
resource_type="document",
optional_resource_id="DocumentX", # Exact ID specified
optional_relation="parent",
optional_subject_filter=SubjectFilter(
subject_type="folder",
),
),
)
)
Once I have the list of parent folders, there does exist a CheckBulkPermissions API, where I can check access to each element in the list using one request, which is great. I would love to read the initial relationships in one request as well.
Solution Brainstorm
A ReadBulkRelationships API akin to the CheckBulkPermissions API would be excellent.
For the example I mentioned above, the desired relationships are somewhat "homogeneous": we're reading the same relation for the same definition type, with only the IDs differing. You can imagine a variant of RelationshipFilter that accepts a list of optional_resource_ids for just this purpose, though I haven't fully thought this interface through. This can definitely be optimized in the SQL to avoid n separate queries. This interface is most useful for me and seems to benefit the most from such an interface.
In general, a user might want "heterogeneous" calls: a list of completely disparate RelationshipFilters to be returned in one response. SpiceDB would return a relationship if it matched any of the given filters. This might not benefit as much at the database query level, but it would certainly benefit from reduced network overhead. I would utilize this in performance-sensitive areas of my app.
Thanks for your time!
@mateenkasim what's the motivation for bundling the calls?
I ask because the primary motivation for CheckBulkPermissions was that making the checks a part of the same request made it easier to bundle related datastore dispatches and reuse overlapping subproblems in the checks. ReadRelationships can't benefit in the same way from bundling requests, because it's not using the caches and the different parts of the request are going to represent distinct datastore calls.
Beyond that, calls to gRPC clients are already parallelized, at least assuming that you're in an environment that supports concurrency, and you aren't going to save a ton on network latency by not making multiple ReadRelationships calls.
There's a couple of related questions here: https://github.com/authzed/api/pull/128#issuecomment-2573888872
For the heterogeneous case, you're right that it wouldn't be better than parallelizing requests. That's a good point!
For the homogeneous case, which my example above describes, I believe you could batch everything into one datastore call. I'm a SQL noob, so tell me if I'm way off base, but couldn't you do something like this?
SELECT * FROM tuples WHERE resource_type="document" AND resource_id IN (id1, id2, ..., idn) AND relation="parent" AND subject_type="folder"
~~resource_type~~ resource_id being the batched part here. This would get you the folder for every document in a list in just 1 SpiceDB -> datastore call. Using ReadRelationships, this would be n calls.
It sounds like that use-case could be addressed by making the optional_resource_id field in RelationshipFilter a list. That seems to be more in line with the types of optimizations BulkCheck does. Would that be sufficient?
That would work, yes! If both RelationshipFilter.optional_resource_id and SubjectFilter.optional_subject_id could be lists, I think that would cover a lot of cases and reduce datastore connections.
This could be a breaking change if both optional_resource_id and optional_subject_id are changed to be lists, would defining similar new fields to be a list be helpful here or do we want the same fields to be lists? What do you think @vroldanbet
I think it would be a breaking change. We would have to decide whether to define new fields, or come up with a new filter definition. I lean towards adding new fields, it should be relatively easy to maintain in the controllers.
It would be protocol buffer wire-format compatible but would break all the generated clients
New fields would be equally useful. I'm interested in the functionality, but I leave the interface to you
Hi, Related to heterogeneous requests, this is actually possible in sql in only one query. At least in Cockroachdb and Postgres.
The following:
select * from relation_tuple where (object_id, relation, userset_object_id) = any(?)
Can be done in CRDB using custom types
For Postgres, for sure it can be done with jsonb_to_record(jsonb)
select * from json_to_record('{"a":1,"b":[1,2,3],"c":"bar"}') as x(a int, b text, d text)
a | b | d
---+---------+---
1 | [1,2,3] |
Which can easily be turned into a one query condition (using a join), and probably custom types also works, but never tried.
I have opened PR's implementing the proposed solution — please have a look when you get a chance. Let me know if any feedback you guys might have. @tstirrat15 @vroldanbet @mateenkasim