tack icon indicating copy to clipboard operation
tack copied to clipboard

Upgrading Kubernetes

Open guiocavalcanti opened this issue 8 years ago • 20 comments

Do you have plans on using k8s 1.4.0? If not, how can I upgrade my version?

guiocavalcanti avatar Sep 29 '16 13:09 guiocavalcanti

Yes. Will release update later today.

On Sep 29, 2016, at 6:46 AM, Guilherme Cavalcanti [email protected] wrote:

Do you have plans on using k8s 1.4.0? If not, how can I upgrade my version?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

wellsie avatar Sep 29 '16 13:09 wellsie

Will we be able to update an existing cluster?

owenmorgan avatar Sep 29 '16 14:09 owenmorgan

@owenmorgan upgrading the k8s version requires deleting the etcd cluster, where all the kubernetes state is stored on ephemeral disk. I have a forked version of tack where etcd state is persisted on an ELB volume, and that works beautifully. Would anybody be interested in a PR to contribute that back to tack (@wellsie)? Let me know and I will clean up and submit.

In the meantime, you can use a simple workaround to recover from losing the etcd cluster. Before upgrading, you would run this code snippet.

This will allow you to recover all cluster state (including PV's). ELB's will be regenerated, so update any DNS records accordingly.

adambom avatar Oct 01 '16 19:10 adambom

Thanks @adambom. how are we looking on an update @wellsie ?

owenmorgan avatar Oct 04 '16 13:10 owenmorgan

@owenmorgan looks like it was patched in 8f2a62eaa4d3ade4930dcb5d80e8f2d3a072a8a7

adambom avatar Oct 04 '16 22:10 adambom

Oh one other thing you'll need to do when you upgrade is taint or manually update the S3 bucket, so that the files in manifests/etc.tar point to the version of k8s you want to use. Otherwise the update won't actually take.

adambom avatar Oct 04 '16 23:10 adambom

great. ill give it a shot. thanks @wellsie @adambom

owenmorgan avatar Oct 04 '16 23:10 owenmorgan

is the backup / restore still necessary @adambom ?

owenmorgan avatar Oct 04 '16 23:10 owenmorgan

i recommend upgrading the cluster manually. i will write up the procedure later this week - in the meantime here is the basic process:

update kubelet.service on worker nodes

  • ssh into each node and update KUBELET_VERSION in /etc/systemd/system/kubelet.service

make instances (new with #77) will dump the ips of all nodes master (etcd,apiserver) and workers. do make ssh-bastion and then from there ssh into each box one at a time.

update kubelet.service on etcd/apiserver nodes

repeat the above procedure for the master (etcd,apiserver) nodes.

update version in kubernetes manifests on etcd/apiserver nodes

grep 1.4 /etc/kubernetes/manifests/*
/etc/kubernetes/manifests/kube-apiserver.yml:    image: quay.io/coreos/hyperkube:v1.4.0_coreos.0
/etc/kubernetes/manifests/kube-controller-manager.yml:    image: quay.io/coreos/hyperkube:v1.4.0_coreos.0
/etc/kubernetes/manifests/kube-proxy.yml:    image: quay.io/coreos/hyperkube:v1.4.0_coreos.0
/etc/kubernetes/manifests/kube-scheduler.yml:    image: quay.io/coreos/hyperkube:v1.4.0_coreos.0

i'm looking into ways to automate this. it hasn't been a priority since the procedure is fairly straight forward. note that running pods should continue to run during this procedure.

wellsie avatar Oct 04 '16 23:10 wellsie

would this procedure work: https://github.com/coreos/coreos-baremetal/blob/master/Documentation/bootkube-upgrades.md ?

nkhine avatar Oct 21 '16 09:10 nkhine

@wellsie any update on the kubernetes automated update? it is fine to do those ^^^ commands manually if you have a small cluster, but with the big would be a headache :)

rimusz avatar Oct 26 '16 16:10 rimusz

ok, have checked out to update /etc/systemd/system/kubelet.service with the never k8s version, the change does not survive the reboot. :(

rimusz avatar Oct 26 '16 16:10 rimusz

@rimusz, It is because tack use user-data, that run every time that the machine power up. You can stop the instance and edit the version on user-data and then start the instance.

I replace user-data with cloud init in my environment, if everything work fine i will submit a PR.

yagonobre avatar Nov 04 '16 20:11 yagonobre

You can use this procedure

Update worker nodes

  1. Create a new launch configuration, you can clone the existing LC and edit the kubernetes version on user-data (have 2 Occurrence).
  2. Terminate all instances and create anothers with the new LC (Be sure that you no have persistentes volumes, e.g. databases, and that your pods are replicated)
    1. Detach Instance by In from the ASG, mark the checkbox for create a new instance, check with kubectl get nodes if the new node are running, then terminate the node that you detach from ASG.
    2. Do It for all nodes.
  • Update user-data for each instance is a alternative.

Update master nodes

  1. Update kubernetes manifests on s3 bucket
    1. Download tar file

      aws s3 cp s3://[BUCKET-URL]/manifests/etc.tar .
      tar -xvf etc.tar
      
    2. Edit the k8s version in all files

      grep 1.4 *.yml
      kube-apiserver.yml:    image: quay.io/coreos/hyperkube:v1.4.0_coreos.0
      kube-controller-manager.yml:    image: quay.io/coreos/hyperkube:v1.4.0_coreos.0
      kube-proxy.yml:    image: quay.io/coreos/hyperkube:v1.4.0_coreos.0
      kube-scheduler.yml:    image: quay.io/coreos/hyperkube:v1.4.0_coreos.0
      
    3. Compress and send file to s3

      tar -cvf etc.tar *.yml
      aws s3 cp etc.tar s3://[BUCKET-URL]/manifests/etc.tar
      
  2. Update user-data for each node
    1. You need to stop instance by instance for edit k8s version on user data (Be sure that you not stop more than one instance per time).
    2. Start the instance
    3. Check health of etcd cluster with etcdctl cluster-health, if all nodes are healthy do it to other instance

@wellsie, please validate this.

yagonobre avatar Nov 04 '16 22:11 yagonobre

@yagonobre thanks for your solution. it looks good, but has way to many manual fiddling, specially with the user-data for each instance, way too much hassle for production clusters. I found using global fleet units for k8s services is much better way to make k8s upgrades.

rimusz avatar Nov 07 '16 11:11 rimusz

Why is it not possible to replace etcd node and let it re-sync with the cluster?

rokka-n avatar Jan 14 '17 03:01 rokka-n

@rokka-n I do it

yagonobre avatar Jan 16 '17 15:01 yagonobre

Are you open to incorporating automated Kubernetes upgrades? If not is the purpose of this project a one-time setup and then you don't need this project anymore?

fearphage avatar Feb 03 '17 14:02 fearphage

Yes open to automated upgrades 👍 On Fri, Feb 3, 2017 at 6:45 AM Phred [email protected] wrote:

Are you open to incorporating automated Kubernetes upgrades? If not is the purpose of this project a one-time setup and then you don't need this project anymore?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/kz8s/tack/issues/75#issuecomment-277264186, or mute the thread https://github.com/notifications/unsubscribe-auth/AE6-kcseiksMO69J11ltoL7iVgtyYuPjks5rYz2PgaJpZM4KKAq4 .

wellsie avatar Feb 03 '17 14:02 wellsie

Yes open to automated upgrades

Excellent! However without automated upgrades, is this intended to be a single-use project?

fearphage avatar Feb 03 '17 14:02 fearphage