spire icon indicating copy to clipboard operation
spire copied to clipboard

Workload API should return SVIDs with a consistent order.

Open azdagron opened this issue 6 years ago • 12 comments

Currently SPIRE returns multiple SVIDs to workloads in random ordering, which conflicts with the spirit of the SPIFFE specification that the first SVID should be the default.

We should fix the Workload API implementation to return a consistent ordering. This could happen as a stopgap before #744, or as part of implementing #744.

UPDATE: Concensus below is to sort by creation date, then entry ID. The creation date is not yet plumbed down through the API.

azdagron avatar Feb 14 '19 21:02 azdagron

What about sorting by the registration entry's created_at time from oldest to newest? This way, entries created later would not override the current default.

APTy avatar Jul 01 '19 14:07 APTy

At some point, I think we'd like a way to have some control over this. In particular, I'd like to always issue some kind of "Default" service identity, but for special-cases we may want to add additional registration entries and we'd want to make sure those aren't the "Default".

mcpherrinm avatar Jul 13 '19 00:07 mcpherrinm

What about sorting by the registration entry's created_at time from oldest to newest? This way, entries created later would not override the current default.

this may be a bit more complex since the handing out of identity is removed from any DB calls (the common RegEntry vs the data model RegEntry)

https://github.com/spiffe/spire/blob/master/pkg/agent/manager/cache/cache.go#L499 https://github.com/spiffe/spire/blob/master/pkg/agent/manager/cache/cache.go#L613

amoore877 avatar Sep 11 '19 15:09 amoore877

This is a priority for us.

esweiss avatar Apr 07 '21 22:04 esweiss

SIG-SPEC kicked this issue back to SIG-SPIRE, while retaining ownership of #744 spire-agent is responsible for issuing SVIDs to workloads, so it would seem composeX509SVIDResponse and composeJWTSVIDResponse would be the impacted functions for any ordering, as indicated in pkg/agent/manager/cache/cache.go:603 // TODO: figure out how to determine the "default" identity and func sortIdenties at line 715

bri365 avatar Apr 09 '21 05:04 bri365

It occurs to me that the simplest SPIFFE specification compliant solution would be to replace EntryId with SpiffeId in sortIdenties for default identity and let the SIG-SPEC proposal for #744 work its way through for control plane administration.

bri365 avatar Apr 09 '21 06:04 bri365

Sorting on EntryId was a quick band-aid to get consistent results between two successive results streamed off the Workload API (it was random before that). It was not intended to solve operator controlled ordering and indeed punted that issue down the road. We didn't have a solid idea of how we wanted operator controlled ordering to work and none of the solutions that were spit-balled at the time seemed adequate.

Sorting by SpiffeId seems marginally better in that it provides some way for operators to influence the ordering but in my opinion is actually much, much worse. It feels 1) inflexible, 2) non-discoverable, 3) non-intuitive. I dread answering the question of a new user, when asked how to influence the ordering, by telling them to sort their IDs lexicographically.

Right now, operators can't reasonably influence the ordering, so they can't take a dependency on the ordering. By introducing an incomplete solution like SpiffeId-based ordering, they can now start to take a dependency on something that we will certainly want to improve on. When the time comes to move off of it in lieu of a more complete solution, it has huge potential to break existing deployments that have come to rely on it.

If we're going to solve ordering, I would really advocate we come up with something that has good UX that we feel comfortable we can live with for a good long time.

azdagron avatar Apr 12 '21 14:04 azdagron

We discussed this issue at length today. The consensus was to keep the issues of consistent ordering (this issue) and administrative control (#744) separate. For administrative control, we think that the idea being floated around SIG-SPEC that provides an opaque label to SVIDs returned from the Workload API is sufficient to give operators a way to enable educated workloads the ability to cherry-pick SVIDs.

For this issue, we agree that the current method (sort by EntryId) is only a partial solution, and is insufficient in the face of registration changes. However, the SpiffeId proposal above conflates consistent ordering and administrative ordering and has some UX issues as pointed out above. We propose that the sorting criteria be changed and documented as follows:

  1. Sort first by entry create timestamp
  2. Sort second by entry ID

This sorting criteria provides a much more stable ordering that is more resilient to entry creation, and if necessary, can still be gamed through CRUD gymnastics without imposing other strange restrictions on entry contents (like the shape of the SpiffeId).

The creation date is not available on the entries returned from the APIs today. This solution would require plumbing that value out of the DataStore layer.

Ideally, the CLI, Workload API, and even the Entry API would provide consistently sorted entries according to this criteria. In the face of paging on the Entry API, this is most certainly going to require some non-trivial work in the DataStore layer.

azdagron avatar Apr 13 '21 19:04 azdagron

Hmmmm. My reading of #744 was an ask for administrative control over the default SVID. I'm not sure how an opaque field helps any runtime select the proper default to list first in responses to workloads. The opaque field is definitely helpful for workloads that are expecting more than one SVID to glean guidance on how to use them; however, this does not appear to address the original issue of administrative control over the default.

bri365 avatar Apr 13 '21 23:04 bri365

@azdagron is it safe to presume Uber was represented in that discussion and the above solution works for their needs?

bri365 avatar Apr 13 '21 23:04 bri365

@guilhermocc Are you also intending to make the agent-side changes to sort the SVIDs returned on the Workload API by the created_at time?

rturner3 avatar Jun 05 '23 19:06 rturner3