IdentityServer
IdentityServer copied to clipboard
Research automatic key management and removing duplicates keys created at same time
Brainstorming:
When new keys created if we notice that a few are created all within close time to one another, then delete all those newer than the oldest. All nodes can have this same logic, so one will win and the rest will try and silently fail. This would only be done long before the key rotation actually happens.
@brockallen Can we do something that will check the existing keys' expiration or creation date before creating a new one?
@brockallen Can we do something that will check the existing keys' expiration or creation date before creating a new one?
What makes that hard is that there could be multiple load balanced instances of identity server all simultaneously attempting to create keys. Since our store interface is very much abstracted from the underlying data store and identityserver as a framework doesn't know anything about the architecture/deployment/scaling, we've avoided attempts to create some kind of distributed lock in the past. Instead, we have some heuristics that have given us good results: we always use the oldest applicable key for signing, and we update the cache more frequently when we're initializing keys. In practice, this tends to give usable results, but we think cleaning up the keys that won't ever be used will make operations easier, because there won't be these spurious keys hanging around.
@brockallen @josephdecock Do you have any update on this topic? This indeed leads to a race condition. We are currently checking whether we can switch to the provided automated key management, but noticed this exact behavior. When multiple instance try to a create a new key for rotation purposes, the instances end up with different states and also announce different sets of keys via openid-configuration/jwks
.
What would be your suggestion how to handle this in the short term? My current approach is to provide a custom IConcurrencyLock<KeyManager>
. That however also feels not ideal, since the Unlock
is not async, while the Lock
is, and because it is in the Internal
namespace.
This conversation continued over a different support channel, but in case anyone is interested in this thread, in brief:
We think that it is usually fine to have some eventual consistency in the nodes and their jwks output, since new keys aren't used until after the propagation delay if two weeks. Two weeks is enough time for the nodes to get to a consistent state and for Clients and APIs to update their cached jwks.