cloud-pipeline icon indicating copy to clipboard operation
cloud-pipeline copied to clipboard

Configure Elasticsearch cluster

Open tcibinan opened this issue 2 years ago • 1 comments

Prerequisites

Indices and Shards

Elasticsearch index consists of one or many primary shards. Each primary shard is a unique subset of index's documents. Each primary shard may have no, one or many replica shards.

Nodes

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/modules-node.html

A single node in Elasticsearch cluster can have one or many of the following roles:

  • master node manages cluster (lightweight but important role)
  • data node stores shards and manages indexing and searching operations
  • ingest node executes pre-processing pipelines (used by system logs in Cloud Pipeline)

Health

The following request is very helpful because it shows the overall health of Elasticsearch cluster.

GET _cluster/health?pretty

The most important fields of the output are the following:

  • status shows an overall cluster health
  • number_of_data_nodes shows a total number of data nodes in cluster
  • relocating_shards shows a number of relocating shards. If it is a positive number then shards relocation is in progress. It shall be checked once an additional data node is added to the cluster.
  • initializing_shards shows a number of initializing shards. If it is a positive number then either new shards are created or existing shards are loaded after restart.
  • active_shards_percent_as_number shows a percent of ready shards. If it is less than 100% then either cluster has been restarted or it has insufficient resources and an additional node may help.
{
    "cluster_name" : "search-elk-cluster-dev",
    "status" : "green",
    "timed_out" : false,
    "number_of_nodes" : 2,
    "number_of_data_nodes" : 2,
    "active_primary_shards" : 4321,
    "active_shards" : 5432,
    "relocating_shards" : 0,
    "initializing_shards" : 0,
    "unassigned_shards" : 0,
    "delayed_unassigned_shards" : 0,
    "number_of_pending_tasks" : 0,
    "number_of_in_flight_fetch" : 0,
    "task_max_waiting_in_queue_millis" : 0,
    "active_shards_percent_as_number" : 100.0
}

Limitations

A single data node in Elasticsearch cluster can efficiently process only a limited amount of shards which depends on the amount of available memory. If a node has insufficient memory than the overall Elasticsearch cluster health decreases, the average request processing time grows and some documents become unavailable.

Solution

Additional data nodes in Elasticsearch cluster are used to evenly distribute shards and workload which shall inevitably improve the overall cluster health.

Implementation

See basic implementation in https://github.com/epam/cloud-pipeline/commit/601aa5a912a983c8d3606cf2b54dce228873b503

Currently, Cloud Pipeline uses a single node installation of Elasticsearch. The approach is quite efficient for small Cloud Pipeline deployments but is not that stable for bigger deployments.

In order to improve Elasticsearch performance on larger Cloud Pipeline deployments an approach for Elasticsearch cluster installation shall be introduced.

Initial implementation includes the usage of:

  • a single node with master, data and ingest roles which already exists in all Cloud Pipeline deployments
  • additional nodes with only data and ingest roles which can optionally be attached

Even though multiple nodes can share a single data path usually it is not a good idea because may cause a data loss in certain scenarious. Better appoach would be to keep hard association between each node and its data path.

In Kubernetes this can be achieved by the usage of StatefulSets which can keep pod name persistent between restarts which allows association between a single pod and a single data path.

Each pod in StatefulSet can be associated with a specific subfolder in a common data path based on its pod name. This static pod names can be also used to assign specific roles f.e. only the first pod may have master role.

Cluster discovery

All Elasticsearch nodes serve requests no matter which roles they have. Therefore Kubernetes Elasticsearch service shall direct traffic to all nodes. And additionally Kubernetes Elasticsearch service shall expose two ports instead of only one.

  • 9200 -> 30091 - requests port, required for index and search requests
  • 9300 -> 30092 - (new) transport port, required for Elasticsearch cluster discovery

Master node

Elasticsearch master node shall have the following configuration.

# /usr/share/elasticsearch/config/elasticsearch.yml

cluster.name: "cloud-pipeline-elk-cluster"
network.host: 0.0.0.0

discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.unicast.hosts: "cp-search-elk.default.svc.cluster.local:30092"

node.master: true
node.data: true
node.ingest: true

Data nodes

Additional Elasticsearch data nodes shall have the following configuration. Once launched these nodes will be automatically attached to the master node once it is ready. Throughout time shards will be automatically relocated to these new nodes distributing the overall workload within the cluster.

# /usr/share/elasticsearch/config/elasticsearch.yml

cluster.name: "cloud-pipeline-elk-cluster"
network.host: 0.0.0.0

discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.unicast.hosts: "cp-search-elk.default.svc.cluster.local:30092"

node.master: false
node.data: true
node.ingest: true

Detach nodes

Elasticsearch data nodes can be gracefully detached from the cluster. Before detaching a node all of its shards shall be relocated to other nodes. This can be achieved by executing the following request with the detaching node's ip address (pod's ip address) instead of DETACHING_NODE_IP.

PUT _cluster/settings

{
    "transient" :{
        "cluster.routing.allocation.exclude._ip" : "DETACHING_NODE_IP"
    }
}

The relocation process starts automatically and a number of relocating_shards becomes positive. Once finished relocating_shards number becomes 0 again. After that the detaching pod can be removed safely without affecting the cluster health.

Finally the cluster settings have to be reverted.

PUT _cluster/settings

{
    "transient" :{
        "cluster.routing.allocation.exclude._ip" : null
    }
}

Important steps

  1. Create Elasticsearch snapshot before proceeding.

  2. Remember that a master node initialises all shards from scratch after restart which usually takes a long time. During that period most of the data is inaccessible.

  3. Also remember that shards relocation takes time and doesn't provide an immediate cluster health/performance improvement.

Remaining issues

  1. Currently, most indices in Cloud Pipeline's Elasticsearch have no replica shards. This leads to an implicit data loss in case of a single node malfunction.

  2. Similarly, most indices have only a single primary shard. This may lead to inefficient processing for larger indices (tens of GB).

  3. No iterative backups are configured for Cloud Pipeline's Elasticsearch cluster.

tcibinan avatar Feb 03 '23 11:02 tcibinan

Below one can find an example of how an existing single node deployment can be safely replaced with a cluster deployment.

# Redeploy latest cp-search-elk deployment
# It is required for local snapshots creation
kubectl edit deploy cp-search-elk

# ELK_BCP_DATE="$(date "+%Y-%m-%d")"
ELK_BCP_DATE="2024-04-24"
ELK_SVC_IP="$(kubectl describe svc cp-search-elk | grep 'IP: ' | awk '{ print $2 }')"

# Retrieve cluster health
curl "http://$ELK_SVC_IP:30091/_cluster/health" | jq

# Submit manual backup
nohup rsync -av "/opt/search-elk/data" "/opt/search-elk/data-backup-$ELK_BCP_DATE" >"searck-elk-data-backup-$ELK_BCP_DATE.out" 2>&1 &
rsync_pid="$!"

# Wait for manual backup to finish
ps aux | grep "$rsync_pid"
less "searck-elk-data-backup-$ELK_BCP_DATE.out"
wait "$rsync_pid"

# Create snapshot repository
curl "http://$ELK_SVC_IP:30091/_snapshot" | jq
curl -X PUT "http://$ELK_SVC_IP:30091/_snapshot/all_backup_repo" -H 'Content-Type: application/json' -d '{
  "type": "fs",
  "settings": {
    "location": "/usr/share/elasticsearch/backup"
  }
}' | jq

# Create snapshot sync
nohup curl -X PUT "http://$ELK_SVC_IP:30091/_snapshot/all_backup_repo/snapshot-$ELK_BCP_DATE?wait_for_completion=false" >"searck-elk-data-snapshot-$ELK_BCP_DATE.out" 2>&1 &
curl_pid="$!"

# Wait for snapshot creation to finish
ps aux | grep "$curl_pid"
less "searck-elk-data-snapshot-$ELK_BCP_DATE.out"
wait "$curl_pid"

# Retrieve snapshot state
curl "http://$ELK_SVC_IP:30091/_snapshot/all_backup_repo/snapshot-$ELK_BCP_DATE" | jq

# Scale down cp-searck-elk deployment
kubectl scale deploy --replicas=0 cp-search-elk

# Redeploy latest cp-search-elk service
# It is required for cp-search-elk statefulset deployment
kubectl edit svc cp-search-elk

# Temporary change elk port from 30091 to 30090
# It is required to temporary disable indices creation
kubectl edit svc cp-search-elk

# Deploy cp-search-elk statefulset
vi cp-search-elk-set.yaml
kubectl apply -f cp-search-elk-set.yaml

# Retrieve cluster health
curl "http://$ELK_SVC_IP:30090/_cluster/health" | jq

# Create snapshot repository
curl "http://$ELK_SVC_IP:30090/_snapshot" | jq
curl -X PUT "http://$ELK_SVC_IP:30090/_snapshot/all_backup_repo" -H 'Content-Type: application/json' -d '{
  "type": "fs",
  "settings": {
    "location": "/usr/share/elasticsearch/backup"
  }
}' | jq

# Retrieve snapshot state
curl "http://$ELK_SVC_IP:30090/_snapshot/all_backup_repo/snapshot-$ELK_BCP_DATE" | jq

# Delete extra indices
curl -X DELETE "http://$ELK_SVC_IP:30090/*" | jq

# Restore snapshot specific indices sync
curl -X POST "http://$ELK_SVC_IP:30090/_snapshot/all_backup_repo/snapshot-$ELK_BCP_DATE/_restore" -H 'Content-Type: application/json' -d '{
    "indices": "cp-billing-pipeline-run-2024-*,cp-billing-storage-2024-*"
}'| jq

# Restore snapshot all indices sync
curl -X POST "http://$ELK_SVC_IP:30090/_snapshot/all_backup_repo/snapshot-$ELK_BCP_DATE/_restore" -H 'Content-Type: application/json' -d '{
}'| jq

# Retrieve cluster health and indices
curl "http://$ELK_SVC_IP:30090/_cluster/health" | jq
curl "http://$ELK_SVC_IP:30090/_cat/indices" | less

# Attach additional node to kubernetes cluster
# by executing init-multicloud.sh from cp-deployment-autoscaler

# Label additional node
kubectl label no NODE_NAME cloud-pipeline/cp-search-elk=true

# Scale statefulset
kubectl scale statefulset cp-search-elk --replicas=2

# Change elk port from 30090 to 30091
kubectl edit svc cp-search-elk

# Retrieve cluster stats
curl "http://$ELK_SVC_IP:30091/_nodes" | jq | less
curl "http://$ELK_SVC_IP:30091/_nodes/stats" | jq | less
curl "http://$ELK_SVC_IP:30091/_cat/indices" | less
curl "http://$ELK_SVC_IP:30091/_cluster/health" | jq
curl "http://$ELK_SVC_IP:30091/_cluster/settings" | jq
curl "http://$ELK_SVC_IP:30091/_cluster/state" | jq | less
curl "http://$ELK_SVC_IP:30091/_cluster/stats" | jq | less

tcibinan avatar Apr 29 '24 08:04 tcibinan