CLI Option to delete SPIRE agent directory/reset existing creds
We use SPIRE agents in our k8s clusters that connect to SPIRE servers. We have multiple trust domains and, some times, users create a cluster and put in the wrong trust domain (accidentally or they were mistaken etc.) To fix this, we have to perform multiple steps:
- Change the SPIRE agent config to point to the correct server and bundle. This is easy enough
- If the SPIRE agent had already attested to the first spire server and gotten a SVID, we need to do a node scaledown so the SPIRE agent can lose it's original SVID. Or else, we see an error that says
x509svid: could not verify leaf certificate: x509: certificate signed by unknown authority
The current way we solve this is by doing a node scale down and then up. This resets the data for SPIRE agent. I propose adding an option to the SPIRE agent CLI that essentially resets the data directory/resets the spire agent so it can connect to the correct server.
Another benefit we can see is that, we started of by using keeping our keys on disk. If/when we move to KMS to manager our keys, our root signing key will change and we will have to deal with the above error. It's a lot nicer to ask teams to run a CLI command (while keeping their other services running) rather than doing a full node scale down/up on all clusters.
- Version: NA
- Platform: K8s
- Subsystem: NA
https://spiffe.slack.com/archives/CBNCC2V17/p1724959960812299
Some context, this is already possible for re-attestable attestors like k8s_psat using the new emptyDir config in the hardened helm charts. However, for non re-attestable attestors like aws_iid, this is not possible as the spire-agent needs to be persistent.
I agree that we need to provide at least documentation on the best way to wipe agent state in Kubernetes. If that becomes too hard, we'll consider adding a command as a last resort (we're worried about that command being invoked accidentally).
This issue is stale because it has been open for 365 days with no activity.