neonKUBE
neonKUBE copied to clipboard
ETCD backup/restore & cluster upgrade
Some thoughts and links for these topics.
I was out on a drive yesterday and pulled over to do some research on my phone, looking into ETCD backup/restore solutions to make having just a single control-plane node more resilient in the cloud. This looks very possible using the etcdctl CLI. We could do a full backup to S3 (etc) every hour and log transactions in the meantime, so S3 should be very close to being up to date at all times.
Then if the cloud relocates the VM to a new host and there's a problem with the ETCD data (or it gets corrupted some other way), we could reload the ETCD data. We'd need to start/stop ETCD (and probably the API server) while we do this but this should only be for a minute or two and whatever is currently running on the cluster will still run, so most user facing services shouldn't see much impact.
We might need to do something similar when need to upgrade ETCD in the future. I did some reading about that too. ETCD does support upgrades but you need to install every version of ETCD between what you have and where you want to be eventually, so that's a pain. So the best approach might be to:
- shutdown the API servers on all masters
- backup ETCD on each of the masters
- upgrade ETCD with no data
- restore the backup
- restart the API servers
Here are some links discussing this:
https://goteleport.com/blog/kubernetes-and-offline-etcd-upgrades/ https://github.com/etcd-io/etcd/blob/main/etcdctl/README.md