vault-operator icon indicating copy to clipboard operation
vault-operator copied to clipboard

pkg/operator: don't erroneously "update" (kill) unhealthy active node

Open cpick opened this issue 6 years ago • 0 comments

Previously, any node whose health couldn't be queried by Vaults.updateLocalVaultCRStatus() would be removed from the standby, sealed, and updated lists of nodes (so long as at least one other node could be reached and was healthy aka changed == true).

Thus, if the active node could not be reached and determined healthy it would be removed from VaultServiceStatus.UpdatedNodes, but would remain VaultServiceStatus.VaultStatus.Active.

Later, this would cause Vaults.syncUpgrade() to determine that the active node was the only non-updated node and then kill it to "complete" the update it assumed was in progress.

Keep note of which nodes have actually been updated irrespective of whether they're reachable and healthy to prevent this issue.

Fixes #344

cpick avatar Sep 14 '18 16:09 cpick