spire icon indicating copy to clipboard operation
spire copied to clipboard

Remote admin authorization mechanisms for the SPIRE server

Open dennisgove opened this issue 2 years ago • 2 comments

Summary

This ticket proposes enhancements to the authorization logic in the SPIRE server to support administration actions from a multitude of actor types, including people. It extends on #1975 and #2099 to describe how alternatively authenticated actors can be privileged to perform various administrative actions based on their identity and location (remote vs local). The design described here allows for remote administration of access control rules outside of SPIRE and takes advantage of the PAP/PDP/PEP (policy {administration,decision,enforcement} point) model so as to stay agnostic to the logic for access administration and evaluation. SPIRE Server will act as a Policy Enforcement Point within this model.

Background

API-based administrator ability is currently determined using in-memory OPA rule evaluation during an API middleware preprocessor utilizing the opaAuth method which constructs an evaluation request and delegates to the OPA authpolicy engine to evaluate allowed access.

The evaluation request looks like

{
  "caller":      "spiffeID of calling actor",
  "full_method": "API method string, eg. /spire.api.server.svid.v1.SVID/MintX509SVID",
  "req":         "API request body, content depends on called endpoint/method",
}

(The following paragraphs assume that LocalOpaProvider.RegoPath and LocalOpaProvider.PolicyDataPath config options have not been set. If those options have been set then the logic described below is still accurate but the rego and data may be different)

A decision is calculated via server/authpolicy/policy.rego which depends on the data in server/authpolicy/policy_data.json.

The returned decision is of the form

{
  "allow": false,
  "allow_if_admin": false,
  "allow_if_local": false,
  "allow_if_downstream": false,
  "allow_if_agent": false,
}

and is used within reconcileResult(...) to compare the decision with the actual state of the calling actor. For example, if allow_if_local = true then the action is allowed if the actor is local to the SPIRE server. If allow_if_admin = true then the action is allowed if the calling actor has authenticated as an administrator (ie. the authenticated SVID is associated with a registered entry with the admin flag set to true).

Circling back to the data informing that decision, the logic basically comes down to API endpoints are accessible to certain caller types.

{
  "full_method": "/spire.api.server.svid.v1.SVID/MintX509SVID",
  "allow_admin": true,
  "allow_local": true
},
{
  "full_method": "/spire.api.server.entry.v1.Entry/BatchCreateEntry",
  "allow_admin": true,
  "allow_local": true,
  "allow_agent": true
},

Limitations

This approach implies the following limitations on API access control

  1. It is not possible to support actors authenticated in another Trust Domain or with another authentication mechanism such as OIDC, LDAP, SSO, or other. The calling actor must be running locally or authenticated within this Trust Domain.
  2. It is not easy to control access at more granular levels. Varying actors cannot be authorized to administer specific pieces of the Trust Domain. It is not possible to say actor A can register entries on these nodes while actor B can register entries on these other nodes.
  3. Access rules must be managed locally to the SPIRE server and changes to those rules requires a configuration change.
  4. Access rules cannot take into account changing external or other enterprise-y information that may inform on actors and their abilities.
  5. Administration can only be authorized for SPIFFE identities. It is not possible to authorize people for administrative actions.

Proposal

I would like to add an option to support remote OPA evaluation. Just as one can provide LocalOpaProvider.RegoPath and LocalOpaProvider.PolicyDataPath values to override the default OPA logic with business specific logic, I would like to support more dynamic rego and data options.

OPA supports a rule and data bundling mechanism allowing for rules and data informing on those to be managed elsewhere. An OPA Server will periodically download these bundles of rules and data from some external service and will then use that bundle during evaluation. The bundle is equivilent in virtually all respects to the rego and policyData provided on disk using the LocalOpaProvider configuration option.

SPIRE would interact with this remote OPA Server in the same way that it interacts with the in-memory OPA evaluation, except that evaluation occurs in an external OPA Server. The input and output would, for now, stay the same.

# in-memory
rs, err := e.rego.Rego(rego.Input(input)).Eval(ctx)

# remote OPA Server
rs, err := e.call_remote_opa_server_over_https(ctx, input)

Example: Let's use the following example while walking through this.

We are in the Trust Domain foo.bar.com and an API request to /spire.api.server.entry.v1.Entry/BatchCreateEntry has just been received. The caller's identity document has been authenticated and contains the identity string arn:aws:iam::123456789012:user/JaneDoe (notice this isn't a SpiffeID). And the request body is of the form

{
  "entries": [
    {
      "spiffe_id": { "trust_domain": "foo.bar.com", "path": "/ns/example/sa/important-workload" },
      "selectors": [
        { "type": "k8s:ns", "value": "example" },
        { "type": "k8s:sa", "value": "important-workload" },
        { "type": "k8s:pod-image", "value": "docker.io/foo/bar/super-important-application:v1.1.0" }
      ],
      "ttl": 3600,
    }
  ]
}

In simple terms, user JaneDoe is trying to create an entry in Trust Domain foo.bar.com with path /ns/example/sa/important-workload that attests using selectors k8s:ns, k8s:sa, and k8s:pod-image, and whose SVIDs should expire within 1 hour. How do we determine if JaneDoe is allowed to create such an entry with these values?

To answer that an http request will be made to the configured OPA Server with the body

{
  "caller":      "arn:aws:iam::123456789012:user/JaneDoe",
  "full_method": "/spire.api.server.entry.v1.Entry/BatchCreateEntry",
  "req": {
    "entries": [
      {
        "spiffe_id": { "trust_domain": "foo.bar.com", "path": "/ns/example/sa/important-workload" },
        "selectors": [
          { "type": "k8s:ns", "value": "example" },
          { "type": "k8s:sa", "value": "important-workload" },
          { "type": "k8s:pod-image", "value": "docker.io/foo/bar/super-important-application:v1.1.0" }
        ],
        "ttl": 3600,
      }
    ]
  }
}

and will receive a decision response in the form

{
  "allow": true,
  "allow_if_admin": false,
  "allow_if_local": false,
  "allow_if_downstream": false,
  "allow_if_agent": false,
}

Notice that the input and output are identical to the existing in-memory OPA evaluation. What we've gained, however, is the ability to utilize dynamic and changing rules and data. The data can reflect that JaneDoe belongs to a role allowing them to manage entries within the namespace foo.bar.com/ns/example and when Jane's role in the company changes the bundle data changes and the decision will be evaluated over that updated data.

With this approach, limitations 2, 3, and 4 above are resolved or become even easier.

2. It is not easy to control access at more granular levels. Varying actors cannot be authorized to administer specific pieces of the Trust Domain. It is not possible to say actor A can register entries on these nodes while actor B can register entries on these other nodes. 3. Access rules must be managed locally to the SPIRE server and changes to those rules requires a configuration change. 4. Access rules cannot take into account changing external or other enterprise-y information that may inform on actors and their abilities.

The access rules and data are now be managed completely outside the perview of SPIRE and changes to either are utilized without any SPIRE involvement. This becomes a dynamic process that can take into account changing data. Administrative access within SPIRE can reflect business changes without any changes to SPIRE systems.

Limitations 1 and 5 becomes easier to reason about and solve with support for alternative authentication mechanisms.

1. It is not possible to support actors authenticated in another Trust Domain or with another authentication mechanism such as OIDC, LDAP, SSO, or other. The calling actor must be running locally or authenticated within this Trust Domain. 5. Administration can only be authorized for SPIFFE identities. It is not possible to authorize people for administrative actions.

Because we can now authorize arbitrary identities from outside the Trust Domain we can start supporting alternative authentication mechanisms without losing control over administrative actions.

Request For Comments: This proposal is intended to get feedback on an additional approach for authorizing admin workloads which can more easily integrate into existing enterprise ACL systems.

dennisgove avatar Jul 29 '22 15:07 dennisgove

Thank you @dennisgove for taking the time to make this proposal. Looks like the support of alternative authentication mechanism would imply the support of authenticating with non SVID material, is that what is being contemplated here?

amartinezfayo avatar Aug 11 '22 19:08 amartinezfayo

Hi @amartinezfayo - great question.

My goal is to support authorizing administrative actions from entities outside of this specific Trust Domain. For example, an application running somewhere else which is allowed to manage node/workload registrations and federation configuration across many Trust Domains. Existing OPA support gets us a lot of that ability. However, the current OPA support is limited to static rules, data, and ACL logic. By adding support for an (optional) remote OPA server we open up the ability to more easily change who/what is allowed to perform which actions. In addition, it becomes possible to update access control rules on the fly without needing to deploy changes to a SPIRE Server instance. It also allows for central control and management of authorization policies and allows SPIRE to very easily utillize existing authorization policy engines an enterprise may be using.

In effect, by moving SPIRE authorization decisions into a remote OPA Server we make it easier for SPIRE to be integrated into existing enterprise environments.

One of the results of this is that a SPIRE Trust Domain doesn't need to maintain or even know about who/what is allowed to administer it. And this allows us to add support for authenticating SVIDs from other Trust Domains. One of the things blocking that support is the very reasonable question

if an admin request to foo.bar.com comes from spiffe://infra-management.bar.com/spire-manager, how do the SPIRE Servers for foo.bar.com know the caller is allowed to do something?

By moving that authorization decision outside of SPIRE we make it easier to support alternative identity documents and identifier structures.

And although one side-effect of OPA support in general is that SPIRE can support authenticating with non SVID material, I am not proposing that it does at this time. I am only looking for SPIRE to better support authenticating and authorizing administrative requests from other Trust Domains.

Existing OPA support allows us to at some future point add support for person authorized administrative actions. This proposal will make managing that access easier. Neither the existing OPA support nor this proposal require such support, and I do not proposing adding such support as part of this.

For now, I would like OPA Servers to be able to authorize administrative calls from entities in other Trust Domains. I am not proposing at this point that we add support for additional authentication mechanisms.

[edit: the first sentence of the final paragraph originally stated "For now, I would like OPA Servers to be able to authenticate and authorize administrative calls from other Trust Domains." The inclusion of the word "authenticate" was a mistake as the OPA Servers would not be authenticating the caller. Authentication of callers from other Trust Domains would occur in the SPIRE Server using federated Trust Bundles before calling out to an OPA Server. The OPA Server would then decide if that already authenticated entity is authorized.]

dennisgove avatar Aug 13 '22 17:08 dennisgove

Thank you for your detailed response, @dennisgove. Authenticating with non SVID material would require deeper changes in SPIRE. Adding support to authorize administrative actions from entities outside of the Trust Domain is something that IMO would be good to have in SPIRE to support certain use cases. But I personally have some concerns around the introduction of a remote OPA Server in the authorization mechanism. This would add an external dependency that would impact both the availability and performance of SPIRE, and it's also a critical piece in the security landscape. A mitigating factor is that this would be optional, so users will choose this only if it's opted-in.

There is some work to do in terms of scoping the changes needed in SPIRE to support this. We will discuss all this in our maintainer's sync, and I'll update this issue with the result of that conversation. Thank you for your patience!

amartinezfayo avatar Aug 24 '22 13:08 amartinezfayo

@dennisgove We discussed this proposal in the last maintainer's sync. Considering that the use of custom authorization policies with OPA is experimental in SPIRE, we feel that the introduction of support for remote OPA evaluation could be a little premature. We think that supporting the authorization of administrative actions from foreign trust domains would be good to have in SPIRE, so we are open to explore what's the best alternative for that. We may enhance the admin_ids configuration option to support foreign trust domains. Could this be a viable alternative to unblock you while we explore more involved options?

amartinezfayo avatar Aug 26 '22 21:08 amartinezfayo

Thanks @amartinezfayo for letting me know. Enhancing the admin_ids configuration option to support foreign trust domains will help. It doesn't quite give us as a dynamic ACL flow as we'd like, but that's not a huge blocker.

In the meantime, I'm going to be using an admin proxy side-car that will have administrative access to the Trust Domain (via standard SPIRE mechanism). It will accept admin requests from external callers, verify access is authorized, and then perform the admin action. I don't love it, and it's just one of a few possible ways to do this, but it'll unblock us for now.

SPIRE Admin Proxy

dennisgove avatar Aug 31 '22 15:08 dennisgove

@dennisgove It's good to know that enhancing the admin_ids configuration option to support foreign trust domains will help to unblock you. I've created #3400 to track that work.

I'm closing this issue for now, as #3400 will be tracking the work to support foreign trust domains. Please feel free to re-open this issue if you think that more discussion is needed in this topic.

amartinezfayo avatar Sep 01 '22 19:09 amartinezfayo