spire
spire copied to clipboard
SPIRE Agent now respects entry's TTL when issuing JWT-SVIDs
Agents used to issue JWT-SVIDs with a TTL hardcoded to 5 minutes. When we refactored the Server APIs, the agents were changed to consume the new Server APIs. The NewJWTSVID RPC from the SVID API is now called by agents, specifying the entry ID. The server then mints the JWT with the TTL specified in the entry. This "new" behavior can be observed since SPIRE v0.11.0.
An example of how agents used to issue JWT-SVIDs can be seen here: https://github.com/spiffe/spire/blob/v0.10.2/pkg/agent/manager/manager.go#L168
Current implementation: https://github.com/spiffe/spire/blob/v1.1.3/pkg/agent/manager/manager.go#L197
We had some discussion in the past about how JWT-SVIDs TTL should be handled (#1438).
Having a hardcoded lifetime for JWT-SVIDs is decidedly not the greatest thing, but was the "less bad" option that we found considering the inherent replay-ability of JWT-SVIDs and the restrictions that we have in the current database schema. Using an aggressive value for the exp claim is an appropriate measure to try to mitigate possible replay attacks.
The change in the behavior to respect entry's TTL was not really intended. As such, we need to find what's the best course of action at this point, considering that (1) this change has been there for quite some time, (2) we had requests in the past to be more flexible than just having a hardcoded TTL, (3) we want to discourage the issuance of JWT-SVIDs with large TTLs, and (4) changes in the behavior need to be thought carefully to consider backwards compatibility.
Since an updated token requires a round trip to the server, being able to configure this setting as appropriate seems to be a good way to alleviate extra load off of the opa servers.
We have a need to set longer JWT-SVID expiration times in some of our Trust Domains. While it's nice that we can currently set the TTL in the entry registration record, because it appears to be more of an accidental ability, we don't feel comfortable relying on it. Also, in cases where we're using the k8s-workload-registrar, that value is not set because the registrar doesn't support it.
SPIRE Server defaulting JWT-SVIDs to a 5 minute lifetime makes lots of sense, but it would be very helpful to us if there was an optional configuration value specifically for JWT-SVIDs.
default_svid_ttl <--- existing config applicable to X509-SVIDs
default_jwtsvid_ttl <--- proposed new config applicable to JWT-SVIDs, default to 5m
max_jwtsvid_ttl <--- proposed new config applicable to JWT-SVIDs, default to value of default_jwtsvid_ttl
default_jwtsvid_ttl
allows folks to set a default TTL for JWT-SVIDs.
max_jwtsvid_ttl
allows folks to set an absolute max TTL for JWT-SVIDs.
When generating a new JWT-SVID, if there is a set ttl
on the registration then the JWT-SVID lifetime will be min(registration_ttl, max_jwtsvid_ttl)
. If there is no set ttl
on the registration then the JWT-SVID lifetime will be default_jwtsvid_ttl
.
This approach allows the project to set reasonable defaults while also allowing folks to override those defaults if their specific use-case necessitates it. Documentation for these configuration values can clearly state that extended JWT lifetimes increase the risk of replay attacks but I think it's appropriate to let users override the defaults when needed.
If the community decides this is a reasonable path, I'd be happy to implement it.
If we want to stay completely backward compatible with existing behavior, then the default value for max_jwtsvid_ttl
can be set to the max(largest TTL in the registration table, default_jwtsvid_ttl)
.
Such a setup would allow users with already longer JWT-SVID lifetimes to upgrade to newer versions of SPIRE without having to explicitly set a value for max_jwtsvid_ttl
. I don't think this is strictly necessary, but an approach to consider.
Hey @dennisgove ! That is an interesting proposal and something similar has come up before. We're still digesting this all and trying to decide on a direction we're all comfortable with. We'll try and have something actionable soon.
The maintainers discussed this issue and decided we would be in favor of a solution that allows users to configure TTLs of X.509-SVIDs and JWT-SVIDs at the entry level with new, separate properties on the Entry type. This would provide the most flexibility for defining JWT-SVID TTLs separately from X.509-SVID TTLs while providing a consistent user experience for how this configuration is provided across SVID types. Concretely, this would look something like:
- New default X.509-SVID and JWT-SVID TTL configuration properties in Server
- New TTL properties in the Entry type (DB, API definitions) for X.509-SVID and JWT-SVID that clearly distinguish their purpose, which is not clear from the existing
ttl
field in the Entry - Backwards-compatible behavior that respects the existing
ttl
field in an entry and existingdefault_svid_ttl
Server configuration property as having precedence over the new entry + default TTL properties
Carrying out this solution would be a bit more involved than your original proposal, @dennisgove , since it will introduce a database migration and API changes, which need to be staggered across minor releases to guarantee N Agent -> N+1 Server upgrade compatibility. Given the increased scope, I wanted to check in to see if you'd be willing/able to contribute this?
Without a doubt. I'm happy to get started on this. Thanks!
Hi @dennisgove, so assuming we can start this right away, a rough sequence and timeline for the changes would be:
- Add database changes to RegisteredEntry model (v1.3.x)
- Add new TTL fields to Entry type in API (v1.5.0)
- API implementation persisting new fields into database (v1.5.0)
- Add Server configuration properties for default X.509-SVID and JWT-SVID TTLs (v1.5.0)
Because we support manual database migrations, whenever we make database changes, we have to wait two minor releases before we can start using those new fields to give users one minor release to perform the migration. Our minor release cadence is every 3 months, so unfortunately that means the implementation changes can only be merged 4-6 months after the database changes are released, depending on the point release in which the database changes are introduced. If we can get the database PR merged in the next 3 weeks, that will make it into v1.3.2, and the subsequent changes would come as part of v1.5.0.
Just wanted to double check that you are ok with pushing the changes over that timeline since it goes a bit out into the future!
Hi @rturner3. I started looking at the required changes earlier today and came up with basically the same schedule, though was thinking v1.4 as I wasn't aware of the 2 minor requirement for db changes.
Schedule looks great, and I'll have a PR with the db changes in a few days.
Just so I'm clear, after the db changes in 1.3.2, all of the other code changes can go into 1.5.0? Or will anything need to spill into 1.6? Maybe removal of the old 'ttl' config option?
Hey @dennisgove , that is a great question. Yes, I think we will also want to remove the old default_svid_ttl
Server configuration parameter in v1.6.0. And yes, assuming the DB changes make it into v1.3.2., all the other code changes can go into v1.5.0.
I'm making steady progress on the implementation of this support, but there are a lot of places that need updates (I'm finding). If curious you can view ongoing changes here.
I've submitted two PRs for this change.
- https://github.com/spiffe/spire/pull/3445
- https://github.com/spiffe/spire-api-sdk/pull/29