spire icon indicating copy to clipboard operation
spire copied to clipboard

CLI Option to delete SPIRE agent directory/reset existing creds

Open faali1 opened this issue 1 year ago • 3 comments

We use SPIRE agents in our k8s clusters that connect to SPIRE servers. We have multiple trust domains and, some times, users create a cluster and put in the wrong trust domain (accidentally or they were mistaken etc.) To fix this, we have to perform multiple steps:

  • Change the SPIRE agent config to point to the correct server and bundle. This is easy enough
  • If the SPIRE agent had already attested to the first spire server and gotten a SVID, we need to do a node scaledown so the SPIRE agent can lose it's original SVID. Or else, we see an error that says x509svid: could not verify leaf certificate: x509: certificate signed by unknown authority

The current way we solve this is by doing a node scale down and then up. This resets the data for SPIRE agent. I propose adding an option to the SPIRE agent CLI that essentially resets the data directory/resets the spire agent so it can connect to the correct server.

Another benefit we can see is that, we started of by using keeping our keys on disk. If/when we move to KMS to manager our keys, our root signing key will change and we will have to deal with the above error. It's a lot nicer to ask teams to run a CLI command (while keeping their other services running) rather than doing a full node scale down/up on all clusters.

  • Version: NA
  • Platform: K8s
  • Subsystem: NA

faali1 avatar Aug 29 '24 23:08 faali1

https://spiffe.slack.com/archives/CBNCC2V17/p1724959960812299

Some context, this is already possible for re-attestable attestors like k8s_psat using the new emptyDir config in the hardened helm charts. However, for non re-attestable attestors like aws_iid, this is not possible as the spire-agent needs to be persistent.

faali1 avatar Aug 30 '24 00:08 faali1

I agree that we need to provide at least documentation on the best way to wipe agent state in Kubernetes. If that becomes too hard, we'll consider adding a command as a last resort (we're worried about that command being invoked accidentally).

azdagron avatar Sep 03 '24 18:09 azdagron

This issue is stale because it has been open for 365 days with no activity.

github-actions[bot] avatar Sep 03 '25 22:09 github-actions[bot]