spicedb
spicedb copied to clipboard
PROPOSAL: Add optional watch annotations to RelationshipUpdate
The Problem Statement
As the developer of an authorization service that uses SpiceDB as its backing store, for regulatory reasons, I need to have an audit log that shows what relationships were present in SpiceDB at what time in order to answer an auditor's question of "who had access to what at what time, and who authorized the change?"
Proposed Solution
We can use the WatchService to answer "who had access to what at what time" by taking the stream of events and saving it to some durable persistence mechanism (Kafka in our case). However, additional information about the transaction (such as who authorized the change to SpiceDB) isn't currently stored on the relation and wouldn't show up in the WatchService.
The WatchService as it currently exists would be a desirable mechanism for audit because it's an in-order log of changes made to SpiceDB, which is our system of record and source of truth.
Attaching a notion of metadata to write requests seems like it could satisfy the above. As long as metadata that goes in with a WriteRelationships request comes out in the WatchService, it wouldn't even really need to be persisted to the backing store beyond the persistence mechanisms already present in the WatchService.
Concretely
message RelationshipUpdate {
enum Operation {
OPERATION_UNSPECIFIED = 0;
OPERATION_CREATE = 1;
OPERATION_TOUCH = 2;
OPERATION_DELETE = 3;
}
Operation operation = 1 [ (validate.rules).enum = {defined_only: true, not_in: [0]} ];
Relationship relationship = 2 [ (validate.rules).message.required = true ];
Struct watch_annotations = 3;
}
The understanding is that these annotations are only for consumption via the watch service, and are not intended for any other logical or semantic purpose.
The naming and location are a suggestion; if there's another location or name that provides the same functionality I'm all ears.
Existing Alternatives
My team is currently discussing a few different ideas, all of which have drawbacks. We have changestreams coming from both web requests to the service and from a kafka consumer listening to changestreams from elsewhere in our system.
One is to write audit events before and after attempting a write to SpiceDB. This has the downside of leaving potentially indeterminate states in the audit log, especially if there are overlapping requests to our web interface that interact with the same object. It won't be clear which request landed first.
Another is to put a queue in the request path, which would help from the perspectives of durability and serializability, but comes at considerable complexity cost.
One alternative that we've come up with that wouldn't be heinous is adding something like
definition audit_subject {}
definition audit_object {
relation subject: audit_subject
}
to our schema and then adding a sideband K/V store for audit metadata, and then inserting a relation into each WriteRelationships request that includes a UUID for the audit_object, adding audit information to the K/V store with that UUID as the key, and then making the thing that uses the WatchService reassociate audit information that way.
This adds additional complexity, so we'd still prefer it if SpiceDB were to somehow be able to support this directly, but we've at least got the above.
I'd like to give a +1 to this feature request 😄
We have changestreams coming from both web requests to the service and from a kafka consumer listening to changestreams from elsewhere in our system.
We have a similar scenario at my organization, and we are consuming watch events to hydrate permissions in search indexes downstream. We'd like to be able to implement observability on the async pipeline (so our requirements aren't quite as rigorous as building an audit trail) and it would be really helpful to be able to include some kind of request-id (or, say, an OpenTelemetry trace id) on the WriteRelationships call, and be able to "see" that request/trace ID in the watch event. This would enable us to build a complete APM trace from "user clicked the 'save' button", through "Kafka event consumer wrote to SpiceDB", to "watch API caller observed the relationship change", to "search index updated to reflect permissions".
Having this kind of traceability will be tremendously valuable for debugging/troubleshooting issues (which stage in the async pipeline dropped the ball?) and would also enable us to establish and monitor SLO's on the async processing (e.g. X% of the time, our search index is updated with latest permissions within Y seconds of the user clicking "save").
Same on our side. We would like to know what batch/CS operator did the change, from which state to which as well as the reason for the change.
Being able to save additional metadata would be very helpful.
Current state of research:
- Postgres: fairly trivial to add since we control the transactions table; would simply be another column
- MySQL: Ditto
- Memdb: Ditto
- Spanner: I believe Spanner's transaction tags can be used to mark the transaction and then read the resulting tag from the changestream
- CRDB: ⚠️ PROBLEM: CRDB does not appear to have any means of correlating a transaction with the entries in the changestream besides writing another row, including during deletions. This would be doable but would add non-insignificant overhead to a write call, as that extra row would be on a shared table.
After some research and prototyping, I've created PR #1914 which allows for attaching metadata to the read/write transaction.
Once this change is in, we'll update the API to allow for a metadata struct to be attached to the WriteRelationships and DeleteRelationships call, with the metadata coming out of the watch API attached to the WatchResponse. This won't allow per-relationship metadata, but since the updates are transactional anyway, the caller can track the per-relationship-update metadata on its own.