spire
spire copied to clipboard
Fix race causing agents to fail attestation if communication is interrupted
When an agent comes up for the first time and performs attestation, it generates a key locally, performs attestation, receives a cert for the local key, then persists this cert. If communication is interrupted between the time that the server successfully attests the agent and the time that the agent persists the new cert, then the agent can enter a state wherein it can never successfully authenticate if the node attestor in use is a TOFU attestor (because the server has already recorded a successful attestation).
This is a race that also existed in agent SVID rotation, however it was fixed with #1128 which introduced a two-step process for committing the success of an agent SVID rotation (which is also a do-once operation).
Fix the race, possibly by taking the same approach we took for agent SVID rotation (which will involve a migration).
This issue is stale because it has been open for 365 days with no activity.
Still relevant.
This issue is stale because it has been open for 365 days with no activity.