spire icon indicating copy to clipboard operation
spire copied to clipboard

Agent soft-restart for re-attestation

Open azdagron opened this issue 4 years ago • 1 comments

When an agent SVID expires or is otherwise invalid (e.g. agent has been evicted) the agent needs to re-attest. Currently the only way for this to happen is for the agent to undergo a full restart. There are many disadvantages to this behavior, like the workload SVID cache being purged, (future) debug/health endpoints being unavailable, etc.

Instead of requiring a full restart, the agent should be refactored so that it can perform a soft restart of only the right set of subsystems that are impacted by the agent SVID not being valid.

In order to implement this safely, we need to decide under what conditions the agent should stop, if any, and when the SVID cache should be purged. For example, the cache should probably be purged if the agent is banned or evicted but not when the agent SVID expires.

azdagron avatar Sep 18 '20 17:09 azdagron

Another question: What should the agent do if re-attstation fails? Is there a point where it gives up and crashes or does it keep attempting (with backoff)?

azdagron avatar Sep 18 '20 17:09 azdagron

Another question: What should the agent do if re-attstation fails? Is there a point where it gives up and crashes or does it keep attempting (with backoff)?

I think keep attempting (with backoff) would be the better default as it would provide more graceful recovery after a prolonged network partition from the SPIRE server. However, if the attestations are reaching the server and being rejected, perhaps it should give up after some number of retries. Could we detect network partition and continue retrying until unless some (configurable?) count of attestation attempts reach the server and get rejected?

zmt avatar Nov 16 '22 20:11 zmt

This issue is stale because it has been open for 365 days with no activity.

github-actions[bot] avatar Nov 16 '23 22:11 github-actions[bot]

This issue was closed because it has been inactive for 30 days since being marked as stale.

github-actions[bot] avatar Dec 17 '23 22:12 github-actions[bot]