kubernetes-elasticsearch-cluster icon indicating copy to clipboard operation
kubernetes-elasticsearch-cluster copied to clipboard

Ensure cluster is in a green state before stopping a pod

Open deimosfr opened this issue 7 years ago • 9 comments

The timeout is set to 8h before releasing the hook and forcing ES node to shutdown.

deimosfr avatar Sep 19 '17 18:09 deimosfr

What happens if I'm deleting the deployment?

pires avatar Sep 26 '17 09:09 pires

Good question, I didn't test. Anyway, I think you can force a delete to bypass hooks.

deimosfr avatar Sep 27 '17 20:09 deimosfr

👍 .. any plans merging this?

otrosien avatar Feb 12 '18 15:02 otrosien

Works for me

deimosfr avatar Feb 12 '18 16:02 deimosfr

@deimosfr what about actively deallocating shards off that node as part of the lifecycle hook? (e.g. setting exclude._ip and waiting for the node to become empty?

otrosien avatar Apr 17 '18 17:04 otrosien

With validation webhooks, it may be possible but it's a far-fetched thing to do here. Maybe an operator feature request?

pires avatar Apr 17 '18 18:04 pires

Regarding the deallocation of shards in the preStop hook, does anyone have a working example? It would be a nice feature to have.

Could something like https://github.com/kayrus/elk-kubernetes/blob/master/docker/elasticsearch/pre-stop-hook.sh be used?

psalaberria002 avatar May 18 '18 20:05 psalaberria002

It is not working in my case, the data pod scaled from 3 to 1 without waiting for status to be "green".

zhujinhe avatar Aug 01 '18 11:08 zhujinhe

@psalaberria002 @zhujinhe There are multiple ways to achieve this.

1. Delocate all shards before proceeding with the next one with preStart and postStop lifecycle hooks.

Here's my - slightly modified - working example which I originally took from https://github.com/helm/charts/blob/5cc1fd6c37f834949cf67c89fe23cf654a9bef77/incubator/elasticsearch/templates/configmap.yaml#L118

It's modified due to the fact that we are using the x-pack security features and therefore need encryption + authentication. You might remove the encryption (https://localhost) and the authentication part -u ${SOME_USER}:${SOME_PASSWORD}

Depending of the network performance and amount of data in the cluster this approach can take very long and decreases the cluster performance a lot due to the relocation of all shards prior each restart.

configmap.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ .Values.data.name }}-cm
  labels:
    app: {{ .Values.data.name }}
data:
  pre-stop-hook.sh: |-
    #!/bin/bash
    set -uo
    echo "Prepare to migrate data of the node ${NODE_NAME}"
    echo "Move all data from node ${NODE_NAME}"
    curl -k -u ${SOME_USER}:${SOME_PASSWORD} -s -XPUT -H 'Content-Type: application/json' 'https://localhost:9200/_cluster/settings' -d "{
      \"transient\" :{
          \"cluster.routing.allocation.exclude._name\" : \"${NODE_NAME}\"
      }
    }"
    echo ""
    while true ; do
      echo -e "Wait for node ${NODE_NAME} to become empty"
      SHARDS_ALLOCATION=$(curl -k -u ${SOME_USER}:${SOME_PASSWORD}} -s -XGET 'https://localhost:9200/_cat/shards')
      if ! echo "${SHARDS_ALLOCATION}" | grep -E "${NODE_NAME}" | grep -v .security-*; then
        echo -e "${NODE_NAME} has been evecuated"
        break
      fi
      sleep 1
    done
  post-start-hook.sh: |-
    #!/bin/bash
    set -uo
    while true; do
      curl -k -u ${SOME_USER}:${SOME_PASSWORD} -XGET "https://localhost:9200/_cluster/health"
      if [[ "$?" == "0" ]]; then
        break
      fi
      echo -e "${NODE_NAME} not reachable, retrying ..."
      sleep 1
    done
    echo ""
    CLUSTER_SETTINGS=$(curl -k -u ${SOME_USER}:${SOME_PASSWORD} -s -XGET "https://localhost:9200/_cluster/settings")
    if echo "${CLUSTER_SETTINGS}" | grep -E "${NODE_NAME}"; then
      echo -e "Activate node ${NODE_NAME}"
      curl -k -u elastic:${ES_BOOTSTRAP_PW} -s -XPUT -H 'Content-Type: application/json' "https:///localhost:9200/_cluster/settings" -d "{
        \"transient\" :{
          \"cluster.routing.allocation.exclude._name\" : null
        }
      }"
    fi
    echo -e "Node ${NODE_NAME} is ready to be used"

deployment.yaml:

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  namespace: {{ .Release.Namespace }}
  name: {{ .Values.data.name }}
  labels:
    app: {{ .Values.data.name }}
spec:
  serviceName: {{ .Values.data.name }}
  replicas: {{ .Values.data.deployment.replicas }}
  revisionHistoryLimit: {{ .Values.data.deployment.revisionHistoryLimit }}
  podManagementPolicy: {{ .Values.data.deployment.podManagementPolicy }}
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: {{ .Values.data.name }}
      annotations:
    spec:
      serviceAccount: {{ .Values.serviceAccount }}
      securityContext:
        runAsUser:  {{ .Values.userId }}
        fsGroup:  {{ .Values.groupId }}
      imagePullSecrets:
        - name: {{ .Values.data.deployment.imagePullSecretName }}
      securityContext:
        runAsUser:  {{ .Values.userId }}
        fsGroup:  {{ .Values.groupId }}
      initContainers:
        - name: {{ .Values.data.deployment.initContainers.increaseMapCount.name }}
          image: "{{ .Values.image.os.repository }}:{{ .Values.image.os.tag }}"
          imagePullPolicy: {{ .Values.image.os.pullPolicy }}
          command:
            - sh
            - -c
            - 'echo 262144 > /proc/sys/vm/max_map_count'
          securityContext:
            privileged: {{ .Values.data.deployment.initContainers.increaseMapCount.securityContext.privileged }}
            runAsUser: {{ .Values.data.deployment.initContainers.increaseMapCount.securityContext.runAsUser }}
      containers:
      - name: {{ .Values.data.shortName }}
        image: "{{ .Values.image.elasticsearch.repository }}:{{ .Values.image.elasticsearch.tag }}"
        imagePullPolicy: {{ .Values.image.elasticsearch.pullPolicy }}
        securityContext:
          capabilities:
            add:
              - IPC_LOCK
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        readinessProbe:
          exec:
            command:
            - /bin/bash
            - -c
            - /usr/bin/curl -k -u ${USERNAME}:${PASSWORD} "https://localhost:9200/_cluster/health?local=true"
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        lifecycle:
          preStop:
            exec:
              command: ["/bin/bash","/pre-stop-hook.sh"]
          postStart:
            exec:
              command: ["/bin/bash","/post-start-hook.sh"]
        volumeMounts:
        - name: lifecycle-hooks
          mountPath: /pre-stop-hook.sh
          subPath: pre-stop-hook.sh
        - name: lifecycle-hooks
          mountPath: /post-start-hook.sh
          subPath: post-start-hook.sh
      terminationGracePeriodSeconds: 86400
      volumes:
      - name: lifecycle-hooks
        configMap:
          name: {{ .Values.data.name }}-cm

The deployment.yaml is not the full file, it has only the required parts for the lifecycle hooks.

2. Just ensure the containers are only being stopped as long the cluster is in green state

Use a readiness probe or a preStop hook.

readinessProbe:

readinessProbe:
  exec:
    command: 
    - /bin/bash 
    - -c 
    - /usr/bin/curl -k -u ${SOME_USER}:${SOME_PASSWORD} "https://localhost:9200/_cluster/health?wait_for_status=green&timeout=30s" | grep -v \"timed_out:\"true

preStop:

        lifecycle:
          preStop:
            exec:
              command: 
              - /bin/bash 
              - -c 
              - /usr/bin/curl -k -u ${SOME_USER}:${SOME_PASSWORD} "https://localhost:9200/_cluster/health?wait_for_status=green&timeout=28800s

It's important to also set terminationGracePeriodSeconds: 28800 otherwise the container will be killed after 30s since this is the default timeout.

mat1010 avatar Aug 01 '18 11:08 mat1010