Open Policy Agent support
What changed?
Added an Authorizer and ClaimMapper implementation to allow temporal to delegate authentication and authorization decisions to Open Policy Agent (OPA)
Why?
Enables configuration of authentication and authorization requirements without having to change the temporal code base. OPA was chosen based on its open source license, deployment model, integration mechanism, and declarative language. As the implementation is simply a HTTP(S) invocation OPA could be swapped out for any HTTP API which supports the inputs and provides a valid response.
How did you test it?
- Unit tests created as code was written TDD
- Manually tested by running locally
Potential risks
If there was a bug in the configuration parser then it could prevent the service from booting, other than that I cant think of anything. The functionality needs to be opted into.
Is hotfix candidate?
No
hanks for the contribution! I added a few comments. I don't know much about OPA so I'll trust that the protocol bits are okay.
My main concern is that I'd rather see a complete example of an OPA setup in the samples-server repo, and leave just unit tests in here, which should be enough to ensure that this client-side doesn't break.
Thanks @dnr. I completely agree with your comments, I believe I've address them with the latest commit.
With regards to the OPA protocol its quite simple and flexible. Essentially a client can POST a HTTP request with a body which looks like:
{
"input": {
// Any JSON here which you want your policy to make decision using
}
}
The request is sent an endpoint with the format /v1/data/your-policy-package-here where the policy package is determined by the policy content. For example if we had a policy in OPA which started with package my.temporal.auth the endpoint would be /v1/data/my/temporal/auth.
The response from this endpoint is of the form:
{
"result": {
// Your policy results here
}
}
So if I had a policy which looked like
package my.temporal.auth
default is_user = true
default is_admin = false
The JSON response would be:
{
"result": {
"is_user": true,
"is_admin": false
}
}
Given we need Temporal to be able to understand the response I've just used the common convention of checking for a variable called allow which must be a boolean. This could really be any known property, but allow is idiomatic in the OPA world and makes sense generally so I thought it was good choice.
So from a Temporal view this really just a webhook which sends the following JSON to any HTTP endpoint you want to configure
{
"input": {
"Claims": {
// The claims
},
"Target": {
// The target
}
}
}
and requires a 200 response with the content
{
"result": {
"allow": true
}
}
Hi @dnr is there anything else you wanted changing with this PR?
Hi, really sorry about the delay on this one.
In general I think the code is great, there's two general reasons I haven't merged it yet:
First, we don't have a great system right now for this category of "contributed" code that lives in this repo but isn't quite supported as part of the core product. Other Temporal projects, e.g. the Go SDK, have a contrib directory where this sort of thing can live, but there isn't one for the server yet. I'd like to have a discussion with the team first about either setting that up, or taking on the responsibility for supporting this piece of code. (Which is admittedly pretty simple, but there's an ongoing testing burden to keep it working as we make changes.)
Second, after spending some time with the "default" authorizer this month, I have a slightly different opinion about how these pieces should fit together: it seems to me that the default authorizer implements the "obviously correct" logic for allowing and denying calls based on claims and target (as of version 1.20). Specifically, it has some knowledge about the semantics of different methods (read-only vs read-write; requires namespace vs not) that we'll keep up to date. Using this authorizer, that logic would have to be replicated in an OPA config, which seems fragile.
I'm thinking that the right place for pluggable logic is the claim mapper: it should call out to OPA to get claims, and then the default authorizer can handle allowing/denying based on the target and claims. This is just an opinion, though :slightly_smiling_face:. If you have some authorization logic that couldn't fit in that structure, I'd be curious to know what it is.
I'm thinking that the right place for pluggable logic is the claim mapper: it should call out to OPA to get claims, and then the default authorizer can handle allowing/denying based on the target and claims. This is just an opinion, though 🙂. If you have some authorization logic that couldn't fit in that structure, I'd be curious to know what it is.
I think that moving this to the claim mapper would be okay. It might be I was trying to go too fine grained or that I misunderstood the default authorizer. One thing I'd be keen to be able to support was allowing some client to signal a workflow in namespace A, but not attach a worker to a task queue in namespace A. As I understood it they are both write operations so can't be split apart by the default authorizer. For example:
- We use namespaces to isolate teams workloads
- If we image there are two namespaces
team-1andteam-2 - Team 1 deploy workers associated with the
team-1namespace, the worker can generate a JWT with theteam-1:ownerclaim so can poll for tasks from theteam-1namespace - Team 2 deploy workers associated with the
team-2namespace, the worker can generate a JWT with theteam-2:ownerandteam-1:consumerclaim so can poll for tasks from theteam-2namespace and signal workflows in theteam-1namespace
If this could be done in the claims mapper then that is great, but I couldn't see how, as the read / write buckets of endpoints are quite big. We we're hoping to ensure that a team couldn't pull work from another team's namespace, either by accident or maliciously. We can prevent access to the data using encryption converters I suppose, but it would be nice to prevent the API calls themselves.
@dnr In addition to the comment above I don't think the default authorizer is good enough for a more general use case which I was hoping this HTTP authorizer would be able to support. One problem I can see at the moment:
Only users with system:read claim can access the Temporal UI - as the UI invokes globalReadonlyAPIs. This means team members who have access to a single namespace can't use the UI or they will have access to read every namespaces workflows (via system:read)
For our use case we would want to allow read access to /namespaces, /search-attributes, and /cluster to anyone with a {namespace}:read claim so that the UI functions, but deny access to namespace-specific APIs. This isn't possible with the default authorizer.
Would you be okay with me proceeding with this http implementation at the authorizer level.
For vis - There's a related proposal here: https://github.com/temporalio/proposals/issues/81
Hello @MichaelSnowden, any update on this ? Would be an amazing feature.. :)