Josh Black
Josh Black
Setting `dead-server-last-contact-threshold` to 10s is likely the problem here. Setting that value too low will cause autopilot to prune your servers before they're even able to join the cluster, and...
3m still seems a bit low to me for the dead server last contact threshold. I wouldn't set that lower than 5m. > This time, on further investigation we noticed...
The dead server last contact threshold can not be reset, which is why we recommend setting it to a high value. After that amount of time has elapsed, that node...
> Since the node that was unable to join the cluster was still showing up in list-peers and autopilot state hours after we would have expected it to be reaped,...
I can tell from the screenshots that node 02 is having trouble - autopilot says it's not healthy, the last term and index don't match what the other nodes are...
Hmmm. If Vault was unable to unseal, then I doubt it. Vault can't really do anything in a sealed state (by design) other than report on its status and unseal.
The last contact time can only be tracked if Vault is unsealed and heartbeating to the other nodes in the cluster. If Vault is sealed, then it can't heartbeat, and...
I'm keenly interested in this as well, and happy to help implement it.
@ehaselwanter Sadly, no. At this point in time, this work isn't a priority. That doesn't mean it won't get done, just that it will likely be awhile before we're able...
@steakknife Everything you mentioned needs to get done, for sure. The biggest problem right now is too many things to do and not enough staff to do them.