spire
spire copied to clipboard
Workload API should return SVIDs with a consistent order.
Currently SPIRE returns multiple SVIDs to workloads in random ordering, which conflicts with the spirit of the SPIFFE specification that the first SVID should be the default.
We should fix the Workload API implementation to return a consistent ordering. This could happen as a stopgap before #744, or as part of implementing #744.
UPDATE: Concensus below is to sort by creation date, then entry ID. The creation date is not yet plumbed down through the API.
What about sorting by the registration entry's created_at
time from oldest to newest? This way, entries created later would not override the current default.
At some point, I think we'd like a way to have some control over this. In particular, I'd like to always issue some kind of "Default" service identity, but for special-cases we may want to add additional registration entries and we'd want to make sure those aren't the "Default".
What about sorting by the registration entry's
created_at
time from oldest to newest? This way, entries created later would not override the current default.
this may be a bit more complex since the handing out of identity is removed from any DB calls (the common
RegEntry vs the data model
RegEntry)
https://github.com/spiffe/spire/blob/master/pkg/agent/manager/cache/cache.go#L499 https://github.com/spiffe/spire/blob/master/pkg/agent/manager/cache/cache.go#L613
This is a priority for us.
SIG-SPEC kicked this issue back to SIG-SPIRE, while retaining ownership of #744 spire-agent is responsible for issuing SVIDs to workloads, so it would seem composeX509SVIDResponse and composeJWTSVIDResponse would be the impacted functions for any ordering, as indicated in pkg/agent/manager/cache/cache.go:603 // TODO: figure out how to determine the "default" identity and func sortIdenties at line 715
It occurs to me that the simplest SPIFFE specification compliant solution would be to replace EntryId
with SpiffeId
in sortIdenties for default identity and let the SIG-SPEC proposal for #744 work its way through for control plane administration.
Sorting on EntryId
was a quick band-aid to get consistent results between two successive results streamed off the Workload API (it was random before that). It was not intended to solve operator controlled ordering and indeed punted that issue down the road. We didn't have a solid idea of how we wanted operator controlled ordering to work and none of the solutions that were spit-balled at the time seemed adequate.
Sorting by SpiffeId
seems marginally better in that it provides some way for operators to influence the ordering but in my opinion is actually much, much worse. It feels 1) inflexible, 2) non-discoverable, 3) non-intuitive. I dread answering the question of a new user, when asked how to influence the ordering, by telling them to sort their IDs lexicographically.
Right now, operators can't reasonably influence the ordering, so they can't take a dependency on the ordering. By introducing an incomplete solution like SpiffeId
-based ordering, they can now start to take a dependency on something that we will certainly want to improve on. When the time comes to move off of it in lieu of a more complete solution, it has huge potential to break existing deployments that have come to rely on it.
If we're going to solve ordering, I would really advocate we come up with something that has good UX that we feel comfortable we can live with for a good long time.
We discussed this issue at length today. The consensus was to keep the issues of consistent ordering (this issue) and administrative control (#744) separate. For administrative control, we think that the idea being floated around SIG-SPEC that provides an opaque label to SVIDs returned from the Workload API is sufficient to give operators a way to enable educated workloads the ability to cherry-pick SVIDs.
For this issue, we agree that the current method (sort by EntryId
) is only a partial solution, and is insufficient in the face of registration changes. However, the SpiffeId
proposal above conflates consistent ordering and administrative ordering and has some UX issues as pointed out above. We propose that the sorting criteria be changed and documented as follows:
- Sort first by entry create timestamp
- Sort second by entry ID
This sorting criteria provides a much more stable ordering that is more resilient to entry creation, and if necessary, can still be gamed through CRUD gymnastics without imposing other strange restrictions on entry contents (like the shape of the SpiffeId
).
The creation date is not available on the entries returned from the APIs today. This solution would require plumbing that value out of the DataStore layer.
Ideally, the CLI, Workload API, and even the Entry API would provide consistently sorted entries according to this criteria. In the face of paging on the Entry API, this is most certainly going to require some non-trivial work in the DataStore layer.
Hmmmm. My reading of #744 was an ask for administrative control over the default SVID. I'm not sure how an opaque field helps any runtime select the proper default to list first in responses to workloads. The opaque field is definitely helpful for workloads that are expecting more than one SVID to glean guidance on how to use them; however, this does not appear to address the original issue of administrative control over the default.
@azdagron is it safe to presume Uber was represented in that discussion and the above solution works for their needs?
@guilhermocc Are you also intending to make the agent-side changes to sort the SVIDs returned on the Workload API by the created_at time?