osm
osm copied to clipboard
[Research] Explore SPIFFE/SPIRE Integration
Please describe the Improvement and/or Feature Request
SPIRE is a CNCF incubating project for platform-agnostic workload identity. At a very basic level, SPIRE could be used as a CA integration for OSM. However, SPIRE also provides the opportunity to for OSM to leverage the SPIFFE protocol to unlock heterogeneous workloads (e.g. VMs, severless, etc.)
Scope (please mark with X where applicable)
- New Functionality [X]
- Certificate Management [X]
Possible Deliverables
- Pros and cons comparing SPIFFE to the current service identity mechanism in OSM. Especially explore the scenario if they are used in multicluster environment
- Rough task list if we migrate from current service identity component to SPIFFE.
- (optional) POC using SPIFFE in OSM
Added default label size/needed
. Please consider re-labeling this issue appropriately.
Do we want to add this to v1.3? If we plan to adopt SPIFFE and SPIRE in the multicluster design, this item can be a prerequisite of it.
Sure, it's probably worth exploring the extent to which we may want to utilize SPIFFE/SPIRE. @trstringer your thoughts?
I suggest a deliverable for this issue:
- Pros and cons comparing SPIFFE to the current service identity mechanism in OSM. Especially explore the scenario if they are used in multicluster environment
- Rough task list if we migrate from current service identity component to SPIFFE.
- (optional) POC using SPIFFE in OSM
Good idea! I added it to the description
Yeah this works for me! Adding it to v1.3.
/assign
I've got my raw notes in this gist. I will summarize the outcomes here:
spiffe integration in OSM
Pros and cons comparing SPIFFE to the current service identity mechanism in OSM.
The aspects of SPIFFE that we are primarily interested in are the x509 SVIDs and the Spiffe ID. In the future we may be interested in a few the other APIs specified but not required initially.
The SPIFFE specs doesn't specify how you use this identity just how it is encoded in the x509 certificate. In our case we have an existing abstraction around identity. We don't really need to change our identity mechanism but instead make it compatible with the SPIFFE specs. For more information on OSM identity see identity PR
Some of the benefits I see for this are:
- By adhering to specifications it enables integrations with rest of eco system (see "Who uses SPIFFE?" at https://spiffe.io/). Once able to use the specification things like OPA integrations and OIDC integrations become options
- Opens the door for integrations with SPIRE. Spire has a few advantages that would likely be worth exploring further like workload attestation plugins, federation, certification rotation. Attestation could be a big draw here.
- potential for a way to reason about using across multiple types of workload beyond kubernetes. Though the naming schema is still left up to the implementors. Our current schema might work but needs some additional thought.
Especially explore the scenario if they are used in multi-cluster environment
At this point I believe SPIFFE isn't a requirement for multi-cluster. It could provide a way to simplify federation and allow for Attestation across different platforms. I would need to dive into the multi-cluster world more to understand what is really needed. Federation could be something really useful here.
tasks list
Rough task list if we migrate from current service identity component to SPIFFE.
- [ ] document current osm Identity solution #5004
- [ ] Switch from
MatchSubjectAltNames
toMatchTypedSubjectAltNames
in Envoy config #5019 - [ ] Add feature flag that turns spiffe id generation on
- [ ] Add feature flag that turns on spiffe id validation (this can't be done at same time in backwards compatible way with out re-issuing all the Service Certificates). It might be possible to automate this but might be just as easy to just wait given time frame to flip a switch or start a new deployment with this enabled.
- [ ] Implement spiffe ID generation in x509 certificates
- [ ] clean up our IssueCertificate Interface, right now it uses prefixes and it could benefit from more structure to make parsing decisions easier later
- [ ]
- [ ] Implement spiffeid x509 validation for RBAC controls
- [ ] Implement using spiffeid to make RBAC decisions
- [ ] Enable control plane certificates issue x509 SVIDS (optional)
- [ ] Implement bundle API for federation with other systems (optional)
- [ ] Integrate envoy certificate validator (optional)
- [ ] Add support in the OMS debugger to view this addition certificate fields
Spire integration
There are several stages that could be implemented for the spire integration. This would need to be completed after
- using SPIRE for the Pod to Pod service certificates
- model this integration after Istio integration (https://blog.spiffe.io/hardening-istio-security-with-spire-d2f4f98f7a63) by using SPIRE's SDS server since Envoy has support
- using SPIRE as OSM Certificate Provider. This could be done in two ways:
- Writing our own certificate provider that knows how to call SPIRE.
- might be able to leverage https://github.com/spiffe/spire/blob/v1.4.0/doc/plugin_server_upstreamauthority_cert_manager.md
To be done for SPIRE integrations:
- [ ] Prototype of SPIRE using SDS server
- [ ] prototype of using SPIRE as certificate provider
prototype
(optional) POC using SPIFFE in OSM
I've got a POC of using SPIFFE IDs and x509 SVIDs at https://github.com/openservicemesh/osm/compare/main...jsturtevant:osm:spiffeid?expand=1. This could be used to do spire prototype.
I've taken comment https://github.com/openservicemesh/osm/issues/4750#issuecomment-1224943168 and split it out into separate issues so work can be broken down. See mentioned issues for future work.
#5030 #5031