vault-operator
vault-operator copied to clipboard
pkg/operator: don't erroneously "update" (kill) unhealthy active node
Previously, any node whose health couldn't be queried by
Vaults.updateLocalVaultCRStatus()
would be removed from the standby, sealed,
and updated lists of nodes (so long as at least one other node could be reached
and was healthy aka changed == true
).
Thus, if the active node could not be reached and determined healthy it would be
removed from VaultServiceStatus.UpdatedNodes
, but would remain
VaultServiceStatus.VaultStatus.Active
.
Later, this would cause Vaults.syncUpgrade()
to determine that the active node
was the only non-updated node and then kill it to "complete" the update it
assumed was in progress.
Keep note of which nodes have actually been updated irrespective of whether they're reachable and healthy to prevent this issue.
Fixes #344