spire icon indicating copy to clipboard operation
spire copied to clipboard

[RFC] Serverless architecture support

Open amartinezfayo opened this issue 4 years ago • 40 comments

[RFC] Serverless architecture support

Co-authored by @MarcosDY.

Background

Serverless computing allows to build applications eliminating the need to manage infrastructure. With serverless applications, the cloud service provider automatically provisions, scales, and manages the infrastructure required to run the code, eliminating the need for server software and hardware management by the developer. The current way of workload attestation in SPIRE does not completely fit in this software design pattern, where the execution context is a temporary runtime environment and is not suitable to have SPIRE Agent running to expose the Workload API alongside the serverless function.

Proposal

In order to allow the issuance of SVIDs to workloads in a serverless environment, we need to provide a way to issue identities to the workload without using the Workload API to obtain an identity. The workload would attest directly to SPIRE Server to obtain its identity. This means that we would go through an attestation process in a similar fashion than node attestation but without yielding an agent role to the attested serverless (and agentless) environment. The attestation process would proceed similarly as the current AttestAgent server RPC, but performed through a new call that would provide an "agentless" identity instead of a node identity in SPIRE. The renewal process would also proceed similarly to the current RenewAgent RPC, where the caller would present an active "agentless" SVID returned by the attestation call or the most recent one from a previous renewal call. This would allow to avoid going through a complete attestation process when the environment already has a valid SVID that needs to be renewed. The criteria to decide if the SVID should be rotated can be similar to the current criteria adopted in SPIRE, i.e.: rotate the SVID if it has less than half of its lifetime left. The proposed solution should facilitate the issuance of identities in a performant manner, focusing on optimizing the usage of resources, otherwise the advantages of the serverless architecture could be seen reduced by the identity issuance process. To that end, this proposal tries to leverage some of the common features available in the cloud providers that aim to solve performance problems, like reusing the execution context if one is available from a previous function call. The proposed process to obtain an identity in a serverless architecture is as follows:

  • Check if there is already a valid SVID available in the execution context.
    • If there is no valid SVID, call the "agentless" attestation RPC to get an identity.

      • Store the obtained identity in a variable declared outside of the function's handler method so it remains initialized, providing additional optimization when the function is invoked again.
    • If there is already a valid SVID, calculate its lifetime left.

      • if it has more than half of its lifetime left, just use it.
      • if it has less than half of its lifetime left, call the renewal RPC and store the obtained identity in a variable declared outside of the function's handler method.

Sample implementation

The following is a description of a sample implementation of the proposed process, including the changes needed in SPIRE and the components required in the serverless environment in order to be able to issue identities without having a SPIRE Agent deployed in the serverless environment.

SPIRE

  • Add new plugin types to perform the "agentless" attestation in SPIRE Server. Have a new plugin for each provider that has a serverless architecture. For example, there will be a plugin to support AWS Lambda, a plugin for Google Cloud Functions, a plugin for Microsoft Azure Functions and other plugins for any other platform. These are some possible workflows for the implementations:

    • AWS Lambda: the function signs a GetCallerIdentity query for the AWS Security Token Service (STS) using the AWS Signature v4 algorithm and sends it to SPIRE Server. The credentials used to sign the GetCallerIdentity request come from the AWS Lambda runtime environment variables which avoids the need for an operator to manually provision credentials first. To attest the "agentless" workload, SPIRE Server sends the query to the AWS STS service to validate it and issues an SVID with a SPIFFE ID constructed from attributes extracted from the parsed signed query.

    • Google Cloud Functions: the function fetches its identity token using the Compute Metadata Server. The attestor plugin in SPIRE Server validates the token provided and issues an SVID with a SPIFFE ID constructed from attributes extracted from the parsed token.

    • Microsoft Azure: the function obtains its access token from the local token service. The attestor plugin in SPIRE Server validates the token provided and issues an SVID with a SPIFFE ID constructed from attributes extracted from the parsed token.

  • Attestation data structs are usually shared from github.com/spiffe/spire/pkg/common/plugin/<plugin_name>, which would be inconvenient to consume externally. Instead, the types required could be exposed through Protocol Buffers definitions under the proto/spire hierarchy.

  • It would be good to expose a library that can be used to facilitate the attestation process from the serverless environment. This library should expose interfaces to construct the attestation material, call the "agentless" attestation RPC in SPIRE Server and ease the reuse of the issued SVID in case that the state of the environment is preserved in a future invocation. It should also provide functionality to perform the SVID renewal process.

Serverless environment

The workload running in the serverless environment needs to be able to be attested without a running SPIRE Agent that exposes the Workload API. Instead, it calls an exposed RPC in SPIRE Server with attestation data that retrieves from the execution runtime. As mentioned above, it would be convenient to have a library that can be consumed in the serverless environment to aid the attestation and identity issuance process. With aim of facilitating the implementation, this proposal recommends implementing a mechanism to externally package dependencies that can be shared across multiple functions. One possible way to achieve this is to have a common interface that can be used to retrieve the identity of the "agentless" workload, that can be called from the running function and is exposed through the runtime environment. For example, in the case of AWS Lambda, the "agentless" attestation functionality can be packaged in a layer. The function that needs to be attested can be configured to use this layer, so it does not need to have it implemented in the function. This layer can also be updated with fixes or improvements without the need of updating the function itself.

Request for Comments

This proposal tries to layout changes needed in SPIRE and possible implementation scenarios to provide support to serverless architectures, focusing on providing a solution for AWS Lambda, Google Cloud Functions and Microsoft Azure Functions. Any feedback on the general direction of this proposal, any missing points, suggestions or thoughts in general is greatly appreciated.

amartinezfayo avatar Sep 16 '20 17:09 amartinezfayo

Thank you @amartinezfayo and @MarcosDY for putting this together - it is a badly needed feature.

There is some prior art from Square here: https://developer.squareup.com/blog/providing-mtls-identities-to-lambdas/

Also an old(er) issue: https://github.com/spiffe/spire/issues/986

I am wondering if you have considered a "push" approach rather than a "pull" approach, e.g. by pushing SVIDs into platform-specific secret stores rather than having functions pull SVIDs from the SPIRE infrastructure. I see a few advantages to push:

  • Greatly reduces complexity/responsibility on the consumer
  • Preserves serverless performance advantages by moving SVID management out of the execution timeframe
  • Reduces the availability requirements of the SPIRE infrastructure

evan2645 avatar Sep 17 '20 15:09 evan2645

I'm not particularly well versed on the auth side of things, but I did want to make a comment on the approach for AWS Lambda for sake of more complete information. The aws-iam-authenticator does something similar, and there have been some learnings about its shortcomings that I recently read. I omitted a couple kubernetes/EKS-oriented items.

  • STS role responses do not include IAM role paths if present, so if the IAM role ARN is arn:aws:iam::111122223333:role/some-path/my-role, the STS response would be arn:aws:sts::111122223333:assumed-role/my-role/session-name. Users cannot have multiple roles of the same name with different paths at the same time
  • Since the implementation is responsible for crafting the pre-signed URL, they determine which STS endpoint is used. This can cause the authenticating webhook to execute STS requests that may use either the global or regional endpoint in a different region
  • The pre-signed URL does contain an AccessKeyId which is logged so the customer can differentiate role sessions, but this requires the customer to look up the key ID in a log and then query CloudTrail to identify which entity assumed the role for that key ID

I believe first-class alternatives will require features from IAM, so this may still need to be the path forward while accepting caveats.

Is it fair to say this also begins to address non-unique node attestation #558 ?

bigdefect avatar Sep 17 '20 20:09 bigdefect

Thank you @evan2645 for your feedback.

I am wondering if you have considered a "push" approach rather than a "pull" approach, e.g. by pushing SVIDs into platform-specific secret stores rather than having functions pull SVIDs from the SPIRE infrastructure.

We discussed this topic during the SIG-SPIRE call on 9/17/2020, but I wanted to summarize my thinking here. We considered the push approach and we felt that it would imply to have a solution that wouldn't really be a native support in SPIRE, involving the introduction of interactions with other systems like secret stores which we felt that would not be completely desirable, and it would be better to take the pull approach with an "agentless" attestation or credentials exchange mechanism. The push approach also introduces some challenges like the availability of the secrets store and extending the trust to other components.

Both approaches have certainly pros and cons, making them a good choice under certain circumstances or a bad / impossible choice in others. Since SPIRE runs in a broad range of environments, we believe that there is room for exploring both types of implementations. The ultimate goal of this RFC is to collect feedback that can tell us if the proposed approach is useful for a variety of use cases, and if it is, be able to work on an implementation based on this.

amartinezfayo avatar Sep 18 '20 22:09 amartinezfayo

Thank you @efe-selcuk for your observations.

The aws-iam-authenticator does something similar, and there have been some learnings about its shortcomings that I recently read.

We will be working on a more detailed proposal so what you point is all valuable information. We may explore ways to work around those points.

Is it fair to say this also begins to address non-unique node attestation #558?

Yes, I think that this proposal goes towards the direction of addressing use cases discussed in that issue.

amartinezfayo avatar Sep 18 '20 23:09 amartinezfayo

Thank you @amartinezfayo and @MarcosDY for working on this, looking forward to serverless support in SPIRE!

I worked on bringing SPIFFE certificates to Lambda for Square (@evan2645 mentioned the blog post) and I wanted to expand on some of the reasons that made us pick push over pull. Blocking on SPIRE server to issue identity is both a performance as well as an availability concern.

We asked ourselves whether there would be a strong security benefit to attesting on startup vs. issuing ahead of time and using a locked down secure storage mechanism. The conclusion we came to is that these would be equivalent and there was no upside to attestation on pull. By storing identity in secrets manager one can use IAM policies and/or SCPs to restrict access.

The reasons developers choose serverless are performance and scalability (within others). Expanding cold start time could be prohibitive for some workloads, not to mention downtime of the SPIRE server would impact availability.

The push approach also introduces some challenges like the availability of the secrets store [...]

It's not uncommon for serverless functions to rely on a secrets store, i.e. if it is down the function might not be able to perform regardless, I don't think this would necessarily be a new dependency.

Another minor point: you listed the main platforms, but K8s has multiple serverless implementations, supporting them all could be tricky. On the other hand: writing to k8s secrets and making secrets accessible to functions could solve for all of these.

To summarize: I think the push model would make for a more reliable and performant solution, be equivalent from a security perspective and come with less code complexity.

mweissbacher avatar Oct 09 '20 18:10 mweissbacher

Thank you @mweissbacher for the feedback, it's really helpful!

We are exploring all the options, including how a push model would look like. We should be able to share an updated proposal in the upcoming days.

amartinezfayo avatar Oct 13 '20 03:10 amartinezfayo

Looking forward to read it! Also, while we tried to be verbose in the serverless identity article, I'm happy to discuss on a call or here if you have questions.

mweissbacher avatar Oct 13 '20 18:10 mweissbacher

Hi folks, chiming in from an infrastructure and serverless perspective. I worked with @mweissbacher on our Lambda mTLS implementation and drove many of our design decisions that happened within the function itself.

The first thing I'd like to do is clear up the misconception that by using the pull model, you will be able to get away with not having some sort of "agent" run inside the function. Developers will want to wrap the logic to pull the certificate in a library or framework. No one is going to bespoke write that RPC logic in every single function they own. Additionally, with the recent release of AWS Lambda Extensions, these libraries can run out-of-process for a huge performance boost (our tests had a 30% reduction in cold start time). So even if there is a pull model, serverless developers will gravitate towards something like background processes or libraries. My team owns libraries for doing mTLS, and other things, within lambda and we actively choose solutions that reduce the amount of code we need to write to do the same thing in various programming languages. While the SPIRE developers may not develop an agent, the community will because it makes sense. Serverless applications are all about abstracting away these kinds of concerns, not adding additional boilerplate code to every single function.

I'd also like to ask some questions about the mechanics of how this pull mechanism would work.

Firstly, what identity would the SPIRE server return based on the assumed role? How is that controlled? I ask because we have each AWS "account" tie to an application identity, so lambdas within that account are treated the same when doing mTLS. Will SPIRE support multiple IAM roles being given the same identity? We encourage teams to customize the execution role for each function to adhere to the principle of least privilege. We do not want every lambda in an account to execute as the same role. We also have several hundred accounts, so mapping this by hand is a non-starter.

Secondly, this proposal sounds like it expects the functions to be running in the same VPC as the SPIRE server? Many companies have a vpc-per-account model, where networking and permissions must be explicitly setup to cross those boundaries. We have a Shared VPC, but we currently restrict lambda traffic to envoy, AWS APIs, and our internal proxy. We would need to automate the setup of networking rules to allow traffic to hit the SPIRE server, which lives in a separate account. By choosing to add the SPIRE server to the function's critical path, it complicates setup and debugging (cross account debugging is particularly painful). Can you talk more about the networking considerations you made in this design?

Lastly, a comment. When it comes to availability, I expect native cloud provider tools to have better uptime than most things developers deploy on top of a cloud provider. Explicitly adding a not cloud native dependency, when a cloud native one exists and works, is not something we see a lot of. Serverless apps rely heavily on cloud native tools already, so adding one more is not a big deal.

Like Michael, I am really excited to see this proposal and the resulting technology. Being able to perform mTLS within lambdas has been a huge win for us, so I look forward to this being easy for other organizations and projects to benefit from.

mtitolo avatar Oct 13 '20 20:10 mtitolo

Thank you @mtitolo for you comment, it's very valuable for us to get this kind of feedback.

The first thing I'd like to do is clear up the misconception that by using the pull model, you will be able to get away with not having some sort of "agent" run inside the function. Developers will want to wrap the logic to pull the certificate in a library or framework. No one is going to bespoke write that RPC logic in every single function they own.

It is the intention of this proposal to leverage the use of any mechanism that improves the performance and facilitates the development experience, like it can be done with AWS Lambda Extensions. In the pull model we contemplated the use of layers that can package the agentless attestation functionality (including caching), and I think that the use of AWS Lambda Extensions could also be very beneficial. I don't think that the pull model precludes the use of these mechanisms.

Firstly, what identity would the SPIRE server return based on the assumed role? How is that controlled? I ask because we have each AWS "account" tie to an application identity, so lambdas within that account are treated the same when doing mTLS. Will SPIRE support multiple IAM roles being given the same identity? We encourage teams to customize the execution role for each function to adhere to the principle of least privilege. We do not want every lambda in an account to execute as the same role. We also have several hundred accounts, so mapping this by hand is a non-starter.

The model for the identity issuance that we have in mind is similar to the agent attestation model, where the agent gets an identity based on the attested properties (selectors). In the case of AWS Lambda, the automatically issued identities may have a form like this: spiffe://<trust-domain>/agentless/aws_lambda/<account_id>/<iam_role>/<function_name>. This adheres to the model of having a different role for each lambda function, which would provide different identities. We are also considering that you may optionally pre-register entries so the function can get an additional identity if the function matches the defined selectors.

Secondly, this proposal sounds like it expects the functions to be running in the same VPC as the SPIRE server?

This is a good point. We were wondering if this could be really an issue for most users. Looks like in fact having to add SPIRE Server to the function's critical path is a real concern. We may think of ways to workaround this problem, but it seems that it's intrinsic to the the pull model and must be annotated as part of the cons of the model.

We are actively working on enhancing / adding more details to the original proposal and include the push model so we can compare pros and cons between them.

amartinezfayo avatar Oct 15 '20 15:10 amartinezfayo

Based on all the feedback received, we explored some alternatives using a push model. We explored options that include the development of external helper programs and also options that introduce built-in support in SPIRE. We are particularly optimistic with the latter, so we created a proof of concept that adds support to serverless computing (like AWS Lambda) in SPIRE through SPIRE Agent plugins (SVIDStore plugins ) that use entry selectors to know about the identities that must be stored in secrets management services offered by cloud providers. The stored identity can then be retrieved by the functions. The serverless functions are registered in SPIRE in the same way that regular workloads are registered through registration entries. The svidstore key is used to distinguish the "storable" entries, and SVIDStore plugins receive updates of those entries only, which indicates that the issued SVID and key must be securely stored in a location accessible by the serverless function, like AWS Secrets Manager. This way, selectors provide a flexible way to describe the attributes needed to store the corresponding issued SVID and key, like the type of store, name to provide to the secret, and any specific attribute needed by the specific service used.

  • Here is a demo of how this works: https://drive.google.com/file/d/1BsIoz60iKYFBZZJLepdnYtRGV3K0LOeu/view?usp=sharing
  • A diagram and a sample function can be found here: https://github.com/MarcosDY/lambda-poc

This is just a proof of concept of how this can be implemented. Any feedback is greatly appreciated!

amartinezfayo avatar Jan 04 '21 23:01 amartinezfayo

@amartinezfayo thank you for the update! Agree that the built-in version implemented as plugin seems favorable over stand-alone helper program. The video looks great. I'm assuming rotation of the certificates is same as with other plugin types, at half-life? Thank you for working on this!

mweissbacher avatar Jan 08 '21 20:01 mweissbacher

I'm assuming rotation of the certificates is same as with other plugin types, at half-life?

@mweissbacher Correct. This is designed to run on top of the cache manager implementation, so it just looks at the selectors to know which SVIDs must be pushed to an external store when they are being updated. This is completely agnostic of the SVID update logic.

amartinezfayo avatar Jan 11 '21 18:01 amartinezfayo

Thanks for putting this together @amartinezfayo and @MarcosDY !

This proposal has been reviewed by the maintainers and is pretty well received. Solving this problem is going to be an awesome boon to SPIRE adoption and flexibility. I think the general concensus at this point is to let this proposal marinate in our minds for a bit to make sure there isn't anything we're missing. Thank you for your patience.

azdagron avatar Jan 12 '21 21:01 azdagron

This is a fork of SPIRE with the POC that is being developed:

  • https://github.com/MarcosDY/spire/tree/agentless_proposal
  • Diff: https://github.com/MarcosDY/spire/pull/1/files

amartinezfayo avatar Jan 21 '21 18:01 amartinezfayo

I wonder if we should introduce a new field (e.g. export_to) in the registration entry that the agent can use to route the entry to an exporter (i.e. SVIDStore) plugin instead of relying on parsing selectors...

azdagron avatar Jan 21 '21 20:01 azdagron

Any comments on the above thought (i.e. export_to)? If we do this, it means a new field on the Entry protobuf, which will require a database migration. If we're solid on that plan, we might want to introduce the new column in 1.0.0 so that this feature can ship in 1.1.0....

@evan2645 @amartinezfayo @rturner3 @APTy

azdagron avatar Jan 26 '21 15:01 azdagron

Does the serverless architecture support still aim to provide a way for a workload to "attest directly to SPIRE Server to obtain its identity" as in the proposal at the top, or has that been dropped in favor of just using SVIDStore plugins? (I was hoping those endpoints would also give me a way to solve #1784 .)

For us, the underlying problem with relying on a secret store our non-cloud container orchestration first establishes container identity and then (inside the container's namespace) bootstraps secrets using that identity. So we of course wouldn't be able to reverse that to establish our secret distribution first.

JackOfMostTrades avatar Apr 12 '21 21:04 JackOfMostTrades

Does the serverless architecture support still aim to provide a way for a workload to "attest directly to SPIRE Server to obtain its identity" as in the proposal at the top, or has that been dropped in favor of just using SVIDStore plugins?

I think that we still want to do this, however community feedback has steered prioritization towards the SVIDStore solution first... so while I can say relatively confidently that the project wants the ability to (easily) attest directly to SPIRE Server, I don't know when that work might be picked up. I think we need to scope it first. We certainly want to make sure your use case is supported... any chance you or someone you know would be willing to contribute it @JackOfMostTrades?

evan2645 avatar Apr 12 '21 22:04 evan2645

I think that we still want to do this, however community feedback has steered prioritization towards the SVIDStore solution first...

Cool, definitely understand the prioritization of solutions targeting the more common cloud provider use-cases, just wanted to check if the project would still be supportive of an architecture that would solve for a direct attestation use case.

any chance you or someone you know would be willing to contribute it @JackOfMostTrades?

I'm lining some short-to-medium term tasks now, so depending on the timing it might be something we take on. :)

JackOfMostTrades avatar Apr 12 '21 23:04 JackOfMostTrades

Cool, definitely understand the prioritization of solutions targeting the more common cloud provider use-cases, just wanted to check if the project would still be supportive of an architecture that would solve for a direct attestation use case.

Yes. We are learning of other interesting direct attestation use cases too, like confidential computing.

I'm lining some short-to-medium term tasks now, so depending on the timing it might be something we take on. :)

Awesome, please do let us know, I'm happy to coordinate such efforts on the SPIFFE/SPIRE side of the house

evan2645 avatar Apr 12 '21 23:04 evan2645

@amartinezfayo @MarcosDY. While reviewing https://github.com/spiffe/spire/pull/2176 and thinking about different ways to structure the cache, I realized that I've been under the impression that everything needed to store a single identity would be represented in a single selector, but that maybe the proposal was advocating for something else?

Can you shed some light on the proposed shape of the selectors? I think there are some clear implementation wins if a selector is self contained but want to make sure I'm not missing something.

azdagron avatar Apr 14 '21 19:04 azdagron

My initial idea was to relay on multiple electors, that can be useful in case we want to provide something more than a name, a possible example is to have something like:

{
   ParentID: "spiffe://example.org/agent",
   SpiffeID: "spiffe://example.org/awsidentity",
   Selectors: []string {
         "name:secret1",
         "kmskeyid:SOME_ID",
         "region": "SOME_REGION"
    }
}

With something like that we can 'configure' secret when creating it and creating in a specific region, intead of all configured regions on aws plugin.

At the same time each platform have different configurations that can be useful, and allowing multiple selectors allow more customizations.

However we can of course add all that information into a single selector and separate value with :. that will make things easier on implementation.

MarcosDY avatar Apr 14 '21 20:04 MarcosDY

But what is better/easier for a user? in case we put all in a single selector how can users filter by selector? setting multiples values allows easier filtering for example search for all entries that uses kmskeyid=1234

MarcosDY avatar Apr 14 '21 20:04 MarcosDY

I am very excited to see the serverless support for spire. Do you have an estimate timeline when this feature will be ready ?

nzxdrexel avatar Jun 16 '21 20:06 nzxdrexel

Hi @nzxdrexel! We plan to be able to include this feature in SPIRE 1.1.0. We are close to release 1.0.0 and after that we should be able to start merging the different pieces of this work.

amartinezfayo avatar Jun 16 '21 22:06 amartinezfayo

On POC initially we start using X509SVIDResponse and store it as a proto binary. It has the downside that the clients of those secrets must understand proto messages, and parse it in order to get DER certificates and keys.

Trying to make users' lives easier, I’m thinking in 2 ways to simplify it.

X509SVIDResponse as JSON Using X509SVIDResponse save it as JSON and then store it.

PROS

  • We already have a proto that we can use.
  • Using JSON user can get certificates and keys using jq
  • Single []byte

CONS

  • X509SVIDResponse provides “DER” certificates/keys, users may use PEM, and they will need to parse it

New JSON Create a new JSON that we can use to provide Identities but with PEM format.

PROS

  • we can provide certificates/keys in PEM format
  • Using JSON user can get certificates and keys using jq
  • Single []byte

CONS

  • New schema we must take care

Proposal

{
   # List of x.509 SVIDs, with private key and bundle for its trust domain
   'svids': {
      {
         # The SPIFFE ID of that identify this SVID
         spiffeID: "spiffe://example.org/workload",
         # PEM encoded certificate chain. MAY invlude intermediates,
         # the leaf certificate (or SVID itself) MUST come first
         x509SVID: "CERT_PEM",
         # PEM encoded PKCS#8 private key
         x509SvidKey: "KEY_PEM",
         # PEM encoded X.509 bundle for the trust domain
         bundle: "BUNDLE_PEM"
      }
   },
   # CA certificate bundles belonging to foreign trust domains that the workload should trust,
   # keyed by trust domain. Bundles are in encoded in PEM format.
   federatedBundles: {
      "spiffe://federated.test": "PEM_CERT",
      "spiffe://another.test": "PEM_CERT"
   }
}

MarcosDY avatar Aug 23 '21 15:08 MarcosDY

Update JSON proposal:

  • allow a single SVID by JSON
  • keep trust domain Bundle separated from federated bundles.
{
   # The SPIFFE ID of that identify this SVID
   spiffeID: "spiffe://example.org/workload",
   
   # PEM encoded certificate chain. MAY invlude intermediates,
   # the leaf certificate (or SVID itself) MUST come first
   x509SVID: "CERT_PEM",
   
   # PEM encoded PKCS#8 private key
   x509SvidKey: "KEY_PEM",
   
   # PEM encoded X.509 bundle for the trust domain
   bundle: "BUNDLE_PEM",

   # CA certificate bundles belonging to foreign trust domains that the workload should trust,
   # keyed by trust domain. Bundles are in encoded in PEM format.
   federatedBundles: {
      "spiffe://federated.test": "PEM_CERT",
      "spiffe://another.test": "PEM_CERT"
   }
}

MarcosDY avatar Aug 24 '21 19:08 MarcosDY

Hi @MarcosDY, thank you for your effort on this.

I would say the update in the JSON proposal is even better and helpful for users. I think it's ok to assume that if you want another SVID, you can push another secret.

About the constraints for the Agent's initialization, today at least one workload attestor plugin is needed, right? But now that we will have the storeSVID plugins...

Would be a good time to change the restrictions to something like "at least one workload attestor plugin or one storeSVID plugin?"

SilvaMatteus avatar Aug 27 '21 14:08 SilvaMatteus

Has there been any thought to supporting JWT SVID for SVIDStore, in addition to x509? It seems doable, but I didn't see any mention in the discussion surrounding SVIDStore.

A difference here is that x509 SVID are issued to immediately upon creating the workload entry at the server, whereas JWT SVID are not issued until a workload does a fetch to the Agent. This allows the workload to request any "aud" claim, but does not provision the SVID in advance. A workaround would be to do a "fetch" for any audience the workload will require, after the entry is created, to cause the SVID(s) to be stored for that workload. A better solution would be to be able to specify an "-audiences" list via a command line option for the "entry create" to force those JWT SVID to be provisioned at that time. Does this seem doable?

If there is a better place to inquire on this pls let me know.

hellerda avatar Jan 19 '22 14:01 hellerda

I was also was wondering if any thought to adding a NodeAttestor plugin based on the IAM Role auth / signed GetCallerIdentity request mentioned here, and surrounding issues (#558, #1784). It would be use the method developed by Vault (https://www.vaultproject.io/docs/auth/aws), and also used by Kubernetes (https://github.com/kubernetes-sigs/aws-iam-authenticator).

I know this was discussed in the context of Serverless Pull, but it seems valuable to have as a simple NodeAttestor as well. It would support Agents running in any AWS runtime, not just EC2. It could also support on-prem nodes, as long as they have access to AWS API and have some AWS Identity configured to start.

There are some pitfalls mentioned here (https://googleprojectzero.blogspot.com/2020/10/enter-the-vault-auth-issues-hashicorp-vault.html), along with a couple of CVE, but it looks like this has bee solved. But it points out the server-side code would have to be done carefully to avoid any vulnerability.

Does this seem like a worthwile endeavor? Or is there sufficient functionality in the existing NodeAttestors, to support agents running in Amazon ECS or Lambda?

hellerda avatar Jan 20 '22 15:01 hellerda