cloud-on-k8s Support smoother k8s nodes rotation when using local volumes

When using local volumes, it can be quite complicated to handle Kubernetes nodes upgrades. One common way to upgrade a k8s node is to take it out of the cluster, and replace it with a fresh new one. In which case the local volume is lost, and the corresponding Elasticsearch Pod stays Pending forever.

When that happens, the only way out is to manually remove both Pod and PVC, so a new Pod gets created with a new volume.

In an ideal world to simplify this, we would like to:

migrate data away from the ES node that will be removed (the k8s node is probably being drained at k8s level already)
once that node is removed, and the corresponding Pod becomes Pending, ECK would delete both Pod and PVC so they are recreated elsewhere
this is a mode of operation the user would probably have to indicate somewhere (in the Elasticsearch spec?). Doing it automatically feels complicated (how long should we wait? will the node come back?) and dangerous.

Related discuss issue: https://discuss.elastic.co/t/does-eck-support-local-persistent-disks-and-is-it-a-good-idea/223515/3

Apr 02 '20 12:04 sebgl

I am not sure whether this is helpful at all because it's at such a high level, but I think the ECK operator could watch the ES data nodes and it's corresponding kubernetes nodes.

In the moment where let's say data-node-0 is no longer scheduled to k8s-node-abc but another node (for whatever reason) you can assume that this Elastic node has lost it's data. If this is the case the ECK operator can delete / recreate the PVC so that the pod is no longer pending anymore.

Does that make sense or am I missing something?

Apr 04 '20 15:04 weeco

this simple script seems work, but not proven in prod.

kubectl cordon k8s-node-abc

kubectl delete pvc -es-xxx --force --grace-period=0

kubectl drain k8s-node-abc --delete-local-data --ignore-daemonsets

kubectl uncordon k8s-node-abc

May 06 '20 09:05 FingerLiu

Relates to https://github.com/elastic/cloud-on-k8s/issues/2448.

Mar 18 '21 14:03 thbkrkr

We've run into this exact issue two times now. When we try to upgrade the k8s version in our nodepool, we lose all our data and the cluster goes into a completely broken state.

I don't know how it works with other providers, but I can speak for GKE. We have a cluster with 3 nodes and an index with 2 shards and 1 replica per shard

What I believe happens is the following:

GKE initiates a node pool version upgrade
A node is drained and its pod is deleted along with local data
A new node spins up with a new pod
The new pod starts receiving data from replica shards stored on the other nodes
GKE respects the pod disruption budget of max 1 unavailable pod, but its patience dies out at 1 hour and after 1 hour it continues the upgrade with a new node ("Note: During automatic or manual node upgrades, PDBs are respected for a maximum of 1 hour. If Pods running on a node cannot be scheduled onto new nodes within 1 hour, the upgrade is initiated, regardless.", from here)
A new node is drained (now 2/3 unhealthy) and everything is a mess

The logs from GKE show that almost exactly one hour passes between each node teardown.

Mar 11 '22 21:03 Jacse

Hello,

Here's our approach to upgrade k8s on local storage node groups:

create a k8s upgraded node group with a dedicated label (let's say group: beta)
change the name and the nodeSelector label of the nodeSet to upgrade, and patch the updateStrategy as following

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch

metadata:
  name: cluster

spec:
  version: 8.3.3

  # Add before removing, ensuring no data is ever lost
  updateStrategy:
    changeBudget:
      maxSurge: 1
      maxUnavailable: 0

  nodeSets:
  
  # Change name to beta (required)
  - name: alpha
    count: 2

    podTemplate:
      spec:
        # Plug on 1 group
        # Change to beta
        nodeSelector:
          group: alpha

delete the old node group once all shards has been migrated, and revert the updateStrategy

Sep 05 '22 21:09 othmane399

cloud-on-k8s cloud-on-k8s copied to clipboard

Support smoother k8s nodes rotation when using local volumes

cloud-on-k8s
cloud-on-k8s copied to clipboard