Block scoping and third-party caveats
This issue tries to describe a scheme that has the potential to improve things on two fronts:
- more explicit semantics wrt block scoping (see #86)
- a support for something similar to Macaroons' third-party caveats
First, a bit of context and a general rephrasing of those two concerns, then a description of the proposed scheme, and finally a summary of open questions
Context
Block scoping
There is an ongoing issue wrt how blocks interact together. In biscuits < v2, all blocks were put in the same context for datalog evaluation. Facts coming from the authority block and the verifier were tagged with special #authority and #ambient symbols which allowed separating trusted facts from facts provided by other blocks. This was quite error prone (both for users and implementers, as was evidenced by a couple security issues in the reference implementation), and only protected the authority block from outside tampering. Non-authority blocks could be poisoned by following blocks.
One of the big improvements of biscuits v2 was to evaluate blocks in order: each block is evaluated completely (including checks) before moving on to the next block (and the verifier is evaluated along with the authority block). This makes sure that a block can't
be tampered with by a following block. The benefits are twofold: users are not required to manually handle #authority and #ambient symbols, and all blocks are protected, not just the authority block.
This is not completely foolproof though (as detailed in #86): for instance, a block adding a time(2099-01-01T00:00:00Z); fact will effectively nullify all following TTL checks. So it can't extend an existing biscuit, but it can prevent further attenuation in a not-so-obvious way.
One solution would be to run each block into its own scope to avoid this kind of issue, but this may be overly restrictive in some cases.
Third-party caveats
Third-party caveats are one of the biggest innovations brought by macaroons. Simply put, they allow putting a restriction in a token that will be fulfilled by a third-party service (whereas first-party caveats are directly discharged by the verifying party).
For each third-party caveat, the third-party can provide a proof that it holds through a discharge macaroon (which can itself embed more restrictions). The verifying party can trust this discharge macaroon through a secret shared with the third party, like for regular macaroons.
Having a comparable feature in biscuits could be rather useful, as it would allow distributed verification of biscuits as well. As powerful as macaroons third-party caveats are, they are still quite complex to operate (especially since they require sending multiple macaroons along a given request, and they also require shared secrets between all parties concerned).
A nice improvement on third-party caveats would:
- retain their initial properties (delegating some checks to a third party, withe the possibility that the proof is made conditional based on extra checks, possibly also delegated to another third-party)
- use public key cryptography to avoid sharing secrets (one of the main improvements of biscuits over macaroons)
- allow packing the proof along with the original token for convenience
- (ideally) be doable with no breaking change to the biscuit wire format and API
Rephrasing
So here, I think what we really want is an explicit way to tell which blocks we can trust (instead of just all the previous blocks). And if you squint just right, a discharge macaroon could be encoded as a block (facts provide the proof, while checks provide additional checks).
Currently, we can only trust the authority block because it's the only one where the signing key is chosen and not shared.
So in the end what we need is to:
- be able to trust blocks other than authority thanks to an extra signature with a chosen key
- be able to declare which blocks we can trust from datalog, possibly taking advantage of public key crypto
What I'm proposing
All that's listed below has been implemented in https://github.com/biscuit-auth/biscuit-haskell/pull/30/files#
Wire format changes
- add an optional
signature * public keypair to non-authority blocks - add an optional scoping directive to blocks, that could be a combination of
authority only | previous blocks (current default) | blocks signed with the following public key | all blocks (unsafe). - add an optional scoping directive to rules (and as such, checks and policies) that would override the optional block-level directive
- (optionally) add a key table if we want to do public key interning same as we do string interning, to save space (this might make signature more complex than needed, especially if we don't want to send the whole token to the third party)
All new fields would be optional, missing values correspond to the current behaviour: the opened PR indeed passes the conformance test suite.
Datalog syntax change
// block-level scoping
trusting previous;
user(1234);
// rule scoping, combining
is_allowed($user, $p) <- user($user), operation("read"), has_external_property($p) trusting ed25519/hex:{public key bytes}, authority;
check if time($t), $t < {expiration} or
no_ttl("bypass") trusting authority;
Here's the adapted grammar change
<rule_body> ::= <rule_body_element> <sp>? ("," <sp>? <rule_body_element> <sp>?)* " " <origin_clause>?
<origin_clause> ::= "trusting " ("any" | <origin_element> <sp>? ("," <sp>? <orgin_elemen> <sp>?)*)
<origin_element> ::= "authority" | "previous" | <signature_alg> "/" <bytes>
<signature_alg> ::= "ed25519"
<block> ::= (<origin_clause> ";")? (<block_element> | <comment> )*
Datalog evaluation change
Keeping the current stateful execution model (interleaving checks & policies with datalog evaluation) is not possible because of dependencies (there can be cycles between block dependencies). What I'm proposing instead is to evaluate datalog in a single go, while keeping track of where a fact comes from. Each query (in rules, checks and policies) can then filter out facts it's not allowed to match on. This model supersedes the current one (either with a "only see authority" or "only see previous blocks" default).
Facts origins
we can model the facts origins as a (non-empty) set of block ids.
- when a fact is declared in block n, its origin is {n}
- when a fact is created through a rule declared in block n, that matched facts f1..fx, its origin is {n} U origin(fact1) … U origin(fact n)
- when the same fact is declared in multiple blocks, we can't directly merge them, so we'd need to either keep facts grouped by origin, or to model origin as a set of sets
For completeness, I've kept the first version collapsed in a details tag.
What I'm proposing
Wire format changes
- add an optional
signature * public keypair to non-authority blocks - add an optional scoping directive to blocks, that could be
authority only | previous blocks (current default) | blocks signed with the following public key. - (optionally) add a key table if we want to do public key interning same as we do string interning, to save space (this might make signature more complex than needed, especially if we don't want to send the whole token to the third party)
Both fields would be optional, missing values correspond to the current behaviour.
Datalog syntax change
Add a datalog directive that declares the block scope.
Datalog evaluation change
This change would require a pre-processing step where each block is tagged with its optional verifying key, so that the datalog context can be populated accordingly before running rules and evaluating checks.
Open questions
Block scoping or rule / check scoping
While the scoping could be applied to individual rules / checks, I feel this would be quite verbose to express this on a datalog-element basis. It would also make the size blow up unless we use key interning
Block selecting semantics
For individual blocks, the facts grouping semantics are clear, but for transitive dependencies, it's less so. Let's say that I have a block A that accepts facts from block B (because it's signed by a trusted third-party), but block B is later in the chain and is configured to see all facts from following blocks. This would make block A see facts from blocks located between A and B, even those not signed.
One solution would be to forbid completely transitive dependencies when depending on blocks located later in the chain, but that can be tricky to implement.
This is a nice path to get the 3rd party features, that can be backwards compatible with existing (v2) tokens. The main issue here for me is defining scope for transitive dependencies. I think 3rd party blocks should not be able to change the scope : the original block that defined a dependency sets the scope, and imported blocks only access that, because they are evaluated in the context of the importing block (are there cases where a 3rd party block would need to access all previous blocks?).
Is the public key enough to identify the imported block? What happens if we add two blocks with that same public key? Do we need to add an id somewhere to differentiate them?
I agree. The tricky part here is that I think we want 3rd-party blocks to have access to authority/verifier facts (so that they can add their own checks), but we want the fact they expose to be more or less self-contained, so what's unclear here is the case of facts derived from 3rd party rules applied on authority/verifier facts.
As for the public key, they're necessary to identify a block, but maybe not sufficient indeed. I feel that just talking about public keys is more ergonomic and is enough, but that's just a feeling.
how do we handle the authority block of the 3rd party token? (we should find a better word for that, considering it would be more tightly integrated. Linked token maybe?) The 3rd party token could have its own blocks, so we have to think of how they are evaluated. What happens if their own blocks require another token? Can the authorizer require the facts from authority block in 3rd party tokens?
On the implementation side, I think it's possible with naive implementations (bottom up) to keep the facts created in each block separated from each other, and choose which ones we assemble at execution time (flattening lists together).
I think we can completely avoid having a 3rd party token: the idea here is that a token could have — in addition to the authority block — other blocks that can be trusted.
Currently the execution graph is linear (we start with the authority (block + verifier) and evaluate block after block. With extra trusted blocks we could define a dag, that could be used to run a topological sort before evaluation; this way we can share common ancestors and run only what's required for diverging branches.
The benefit of this approach is that it's all backwards compatible and builds on existing concepts. The issue is that is puts everything in the same namespace (it mixes facts provided by the alternate authority blocks with the facts used in the alternate authority blocks checks). Another approach would be something closer to discharge macaroons, but that's quite a bit of extra work (instead of extending the current data & execution models, it's more of adding an extra layer; I'm not sure what it would entail in terms of backwards compatibility, size, and performance).
I really like the idea of keeping a single token and providing a way to trust more blocks, as it feels more natural, has clear backwards compat properties, and would make explicit a couple things in the execution model that are a bit unclear at the moment. That being said, the lack of namespacing could prove problematic
I'll try to work on an example based on the macaroons paper
Here's an example: I want to grant read access to resource1 on service1 to anyone who's part of a specific group on service2:
authority block (signed by service1): right("resource1", "read"); check if group_member("group1") @ ed25519 <service2 pubkey>;
as is, the token is not usable, because the group_member("group1") check cannot be fulfilled. The @ ed25519 <service2 pubkey> would declare that this fact can only come from a block whose signature can be verified with the provided public key.
The only way to make the token usable would be to provide a block (whose signature can be verified with the provided public key) containing the following fact group_member("group1"). Such a block could also contain a ttl check (check if time($t), $t < <date>), so that the proof it carries (holder belongs to group1`) stays only valid for a limited amount of time.
Working on this example convinced me of two things:
- relying on block evaluation order is too restrictive and hard to get right. Here we want to add an 3rd party check in the authority block itself
- block scoping for facts origin is too coarse. in the same block we might want to check both ambient facts provided by the verifier and 3rd party facts
Proposed solution
Evaluation
Instead of relying on the evaluation order, we go back to something close to what was done in biscuit v1: tracking facts provenance (in v1 it was done only for the authority / ambient facts and was tied to facts declaration in datalog; here it would be done for all blocks and automatically).
So instead of evaluating datalog block after block, running checks and policies step by step, we evaluate everything, while keeping track of the facts provenance. Then, we run checks (taking into account optional facts provenance, default is "only facts from verifier + previous blocks") Then, we run policies (taking into account optional facts provenance, default is "only facts coming from authority block + verifier")
Facts origin
we can model the facts origins as a (non-empty) set of block ids.
- when a fact is declared in block n, its origin is
{n} - when a fact is created through a rule declared in block n, that matched facts f1..fx, its origin is
{n} U origin(fact1) … U origin(fact n) - when the same fact is declared in multiple blocks, we can't directly merge them, so we'd need to either keep facts grouped by origin, or to model origin as a set of sets
Tracking all this makes the datalog evaluation simpler imo (right now we have to interleave generating facts with running checks, it can be error prone), and easier to debug: for each fact, we know where it comes from.
Datalog annotations
We need to be able to talk about facts origins from datalog statements. The best place for this is at the ruleBody level (in the body of a check / policy). It could also be done at the check / policy level (a check or policy can contain multiple queries, only a single one needs to match), but that would restrict expressivity: it would not let us say "i want to check if holder is friends with X on facebook OR is part of org Y on github").
The proposed syntax for a rule body would become
<rule_body> ::= <rule_body_element> <sp>? ("," <sp>? <rule_body_element> <sp>?)* ("@" <sp> <origin_clause>)?
<origin_clause> ::= "any" | <origin_element> <sp>? ("," <sp>? <orgin_elemen> <sp>?)*
<origin_element> ::= "authority" | "previous" | <signature_alg> <bytes> | <signature_alg> <string> -- string would take a b64 public key, for convenience
<signature_alg> ::= "ed25519"
anywould trust any fact (it's dangerous in the general case, but can be useful to ensure the presence of some facts)authoritywould trust only the authority (+ verifier) facts: that would be the default for policies / verifier checkspreviouswould only trust the previous blocks (including the authority and verifier) facts: that would be the default for blocks<bytes>(or<string>) would allow providing public keys. The algorithm part is necessary to correctly identify the public key (same as we do in the wire format)
any, authority and previous directly give us block ids. Public keys would we used to select the block ids with the matching keys. So in the end, each rule body would be tagged with a set of block ids. For a block to match, its origins would have to be a subset of the allowed block ids. Note that this would also allow filtering in rules, not just in checks. I'm not sure if it's good or bad (it could be restricted to only checks / policies if deemed to complex in rules)
I think that interning public keys would be a good addition to avoid making the token side blow up
Done in #103