jaeger-kubernetes
jaeger-kubernetes copied to clipboard
Elasticsearch in production always Back-off restarting failed container
elasticsearch version:
docker.elastic.co/elasticsearch/elasticsearch:5.6.0
k8s cluster version
1.10
describe
# kubectl describe pods -n jaeger elasticsearch-0
Name: elasticsearch-0
Namespace: jaeger
Node: node-1/192.168.205.128
Start Time: Sat, 28 Apr 2018 16:44:35 +0800
Labels: app=jaeger-elasticsearch
controller-revision-hash=elasticsearch-8684f69799
jaeger-infra=elasticsearch-replica
statefulset.kubernetes.io/pod-name=elasticsearch-0
Annotations: <none>
Status: Running
IP: 192.168.3.197
Controlled By: StatefulSet/elasticsearch
Containers:
elasticsearch:
Container ID: docker://941824d0c9186862372c793d41d578a5e34c0972c877771d00629dc375593530
Image: docker.elastic.co/elasticsearch/elasticsearch:5.6.0
Image ID: docker-pullable://docker.elastic.co/elasticsearch/elasticsearch@sha256:f95e7d4256197a9bb866b166d9ad37963dc7c5764d6ae6400e551f4987a659d7
Port: <none>
Host Port: <none>
Command:
bin/elasticsearch
Args:
-Ehttp.host=0.0.0.0
-Etransport.host=127.0.0.1
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Sat, 28 Apr 2018 16:50:57 +0800
Finished: Sat, 28 Apr 2018 16:50:57 +0800
Ready: False
Restart Count: 6
Readiness: exec [curl --fail --silent --output /dev/null --user elastic:changeme localhost:9200] delay=5s timeout=4s period=5s #success=1 #failure=3
Environment: <none>
Mounts:
/data from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-8l8qt (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-8l8qt:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-8l8qt
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m default-scheduler Successfully assigned elasticsearch-0 to node-1
Normal SuccessfulMountVolume 7m kubelet, node-1 MountVolume.SetUp succeeded for volume "data"
Normal SuccessfulMountVolume 7m kubelet, node-1 MountVolume.SetUp succeeded for volume "default-token-8l8qt"
Normal Pulling 6m (x4 over 7m) kubelet, node-1 pulling image "docker.elastic.co/elasticsearch/elasticsearch:5.6.0"
Normal Pulled 6m (x4 over 7m) kubelet, node-1 Successfully pulled image "docker.elastic.co/elasticsearch/elasticsearch:5.6.0"
Normal Created 6m (x4 over 7m) kubelet, node-1 Created container
Normal Started 6m (x4 over 7m) kubelet, node-1 Started container
Warning BackOff 2m (x22 over 7m) kubelet, node-1 Back-off restarting failed container
log
# kubectl logs -n jaeger elasticsearch-0
# nothing shown.
@chalvern hi, did you manage to solve it? As there are no log's it's hard to find out what caused the issue.
@pavolloffay I am afraid not, but possibly be source limit, as my k8s cluster is setted on 2 vm machines, each with 2cpu/2Gmemory. I will check it after in my free time.
As what I said, out of memory...
May 3 21:27:03 xxx-1 kernel: [74354.386802] Out of memory: Kill process 35184 (java) score 1621 or sacrifice child
May 3 21:27:03 xxx-1 kernel: [74354.387300] Killed process 35184 (java) total-vm:2599788kB, anon-rss:1262648kB, file-rss:0kB
Then it's environment issue, I will close it. If anything pops up up feel free to reopen.
Finally, my solution is to add the following ENV config to elasticsearch.yml
env:
- name: ES_JAVA_OPTS
value: -Xms256m -Xmx512m
- name: bootstrap.memory_lock
value: "true"
I'm reopening this, so that we apply @chalvern's env vars to elasticsearch.yml.
@chalvern would you be interested in contributing a fix to this?
-Xms256m -Xmx512m seems very low for Elasticsearch. For example openshift logging uses 8gb by default
I am also making a pointer do docs for bootstrap.memory_lock https://www.elastic.co/guide/en/elasticsearch/reference/master/setup-configuration-memory.html#bootstrap-memory_lock
@jpkrohling I worry the -Xms256m -Xmx512m is too low to use in production, just as @pavolloffay mentioned. The yaml of Elasticsearch in production seems actually like in test instead of in production.
What I suggest is to take it as test. In production, there should be replica of Elasticsearch or called cluster.