etcd-issues copied to clipboard
etcd crashes in EKS cluster
Following your article
helm search repo bitnami | grep etcd
bitnami/etcd 8.5.11 3.5.6 etcd is a distributed key-value store designed ...
I found the helm chart 8.5.11 provides etcd version 3.5.6. I upgraded my existing apisix installation by updating the version in charts.yaml :
helm dependency list ./charts/apisix
etcd 8.5.11 ok
apisix-dashboard 0.6.1 ok
apisix-ingress-controller 0.11.1 ok
helm upgrade apisix ./charts/apisix --set gateway.type=LoadBalancer --set allow.ipList="{}" --set ingress-controller.enabled=true --namespace ingress-apisix --set ingress-controller.config.apisix.serviceNamespace=ingress-apisix --set gateway.tls.enabled=true --set ingress-controller.config.apisix.adminKey=x --set admin.credentials.admin=xxxxx --set xxxx admin.credentials.viewer=xxxxx --set ingressController.config.apisix.baseURL=http://apisix-admin:9180/apisix/admin --set dashboard.enabled=true
However, etcd still crashes :
mk logs -f apisix-etcd-0
etcd 04:02:06.39
etcd 04:02:06.39 Welcome to the Bitnami etcd container
etcd 04:02:06.39 Subscribe to project updates by watching
etcd 04:02:06.39 Submit issues and feature requests at
etcd 04:02:06.39
etcd 04:02:06.39 INFO ==> ** Starting etcd setup **
etcd 04:02:06.41 INFO ==> Validating settings in ETCD_* env vars..
etcd 04:02:06.41 WARN ==> You set the environment variable ALLOW_NONE_AUTHENTICATION=yes. For safety reasons, do not use this flag in a production environment.
etcd 04:02:06.41 INFO ==> Initializing etcd
etcd 04:02:06.41 INFO ==> Generating etcd config file using env variables
etcd 04:02:06.43 INFO ==> There is no data from previous deployments
etcd 04:02:06.44 INFO ==> Adding new member to existing cluster
etcd 04:02:16.59 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:02:36.68 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:02:56.76 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:03:16.84 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:03:36.91 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:03:57.00 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:04:17.08 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:04:37.15 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:04:57.27 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:05:17.33 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:05:37.43 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:05:57.53 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
These are events from the kubernetes cluster :
21m Warning FailedPreStopHook pod/apisix-etcd-0 Exec lifecycle hook ([/opt/bitnami/scripts/etcd/]) for Container "etcd" in Pod "apisix-etcd-0_ingress-apisix(a9c0fe68-6cec-4934-9934-43678e34977f)" failed - error: command '/opt/bitnami/scripts/etcd/' exited with 137: , message: ""
21m Warning FailedPreStopHook pod/apisix-etcd-1 Exec lifecycle hook ([/opt/bitnami/scripts/etcd/]) for Container "etcd" in Pod "apisix-etcd-1_ingress-apisix(0e9be7b4-c34a-4992-97cf-a8426766534a)" failed - error: command '/opt/bitnami/scripts/etcd/' exited with 137: , message: ""
20m Warning FailedMount pod/apisix-etcd-2 Unable to attach or mount volumes: unmounted volumes=[data etcd-jwt-token kube-api-access-jtccd], unattached volumes=[data etcd-jwt-token kube-api-access-jtccd]: timed out waiting for the condition
20m Normal NoPods poddisruptionbudget/apisix-etcd No matching pods found
20m Normal SuccessfulCreate statefulset/apisix-etcd create Claim data-apisix-etcd-1 Pod apisix-etcd-1 in StatefulSet apisix-etcd success
20m Normal WaitForFirstConsumer persistentvolumeclaim/data-apisix-etcd-0 waiting for first consumer to be created before binding
20m Normal WaitForFirstConsumer persistentvolumeclaim/data-apisix-etcd-1 waiting for first consumer to be created before binding
20m Normal SuccessfulCreate statefulset/apisix-etcd create Pod apisix-etcd-1 in StatefulSet apisix-etcd successful
20m Normal SuccessfulCreate statefulset/apisix-etcd create Pod apisix-etcd-0 in StatefulSet apisix-etcd successful
20m Normal SuccessfulCreate statefulset/apisix-etcd create Claim data-apisix-etcd-0 Pod apisix-etcd-0 in StatefulSet apisix-etcd success
20m Normal SuccessfulCreate statefulset/apisix-etcd create Pod apisix-etcd-2 in StatefulSet apisix-etcd successful
20m Normal WaitForFirstConsumer persistentvolumeclaim/data-apisix-etcd-2 waiting for first consumer to be created before binding
20m Normal SuccessfulCreate statefulset/apisix-etcd create Claim data-apisix-etcd-2 Pod apisix-etcd-2 in StatefulSet apisix-etcd success
20m Normal ProvisioningSucceeded persistentvolumeclaim/data-apisix-etcd-0 Successfully provisioned volume pvc-6b3d1b5c-b1a6-4bc0-9ff2-32de868e4cc7 using
20m Normal ProvisioningSucceeded persistentvolumeclaim/data-apisix-etcd-1 Successfully provisioned volume pvc-fbc18bc2-7e9a-4a2e-9311-bbd446a846de using
20m Normal ProvisioningSucceeded persistentvolumeclaim/data-apisix-etcd-2 Successfully provisioned volume pvc-3537fab5-57c6-4386-9d4a-3f623b2b4db3 using
20m Normal Scheduled pod/apisix-etcd-0 Successfully assigned ingress-apisix/apisix-etcd-0 to ip-172-31-110-110.ap-south-1.compute.internal
20m Normal Scheduled pod/apisix-etcd-1 Successfully assigned ingress-apisix/apisix-etcd-1 to ip-172-31-118-166.ap-south-1.compute.internal
20m Normal Scheduled pod/apisix-etcd-2 Successfully assigned ingress-apisix/apisix-etcd-2 to ip-172-31-102-32.ap-south-1.compute.internal
20m Normal SuccessfulAttachVolume pod/apisix-etcd-2 AttachVolume.Attach succeeded for volume "pvc-3537fab5-57c6-4386-9d4a-3f623b2b4db3"
20m Normal SuccessfulAttachVolume pod/apisix-etcd-0 AttachVolume.Attach succeeded for volume "pvc-6b3d1b5c-b1a6-4bc0-9ff2-32de868e4cc7"
20m Normal SuccessfulAttachVolume pod/apisix-etcd-1 AttachVolume.Attach succeeded for volume "pvc-fbc18bc2-7e9a-4a2e-9311-bbd446a846de"
20m Normal Started pod/apisix-etcd-0 Started container etcd
20m Normal Started pod/apisix-etcd-1 Started container etcd
20m Normal Pulled pod/apisix-etcd-1 Container image "" already present on machine
20m Normal Created pod/apisix-etcd-1 Created container etcd
20m Normal Pulled pod/apisix-etcd-0 Container image "" already present on machine
20m Normal Created pod/apisix-etcd-0 Created container etcd
20m Normal Pulled pod/apisix-etcd-2 Container image "" already present on machine
20m Normal Created pod/apisix-etcd-2 Created container etcd
20m Normal Started pod/apisix-etcd-2 Started container etcd
5m1s Warning Unhealthy pod/apisix-etcd-2 Readiness probe failed:
5m2s Warning Unhealthy pod/apisix-etcd-0 Readiness probe failed:
5m1s Warning Unhealthy pod/apisix-etcd-1 Readiness probe failed:
16m Warning Unhealthy pod/apisix-etcd-2 Liveness probe failed:
16m Warning Unhealthy pod/apisix-etcd-1 Liveness probe failed:
16m Warning Unhealthy pod/apisix-etcd-0 Liveness probe failed:
16m Warning FailedPreStopHook pod/apisix-etcd-0 Exec lifecycle hook ([/opt/bitnami/scripts/etcd/]) for Container "etcd" in Pod "apisix-etcd-0_ingress-apisix(3ed8842f-ec58-47d4-b315-ce8a79328578)" failed - error: command '/opt/bitnami/scripts/etcd/' exited with 128: Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex...
16m Normal Killing pod/apisix-etcd-2 Container etcd failed liveness probe, will be restarted
16m Warning FailedPreStopHook pod/apisix-etcd-2 Exec lifecycle hook ([/opt/bitnami/scripts/etcd/]) for Container "etcd" in Pod "apisix-etcd-2_ingress-apisix(9ee4b468-5674-4dfb-8bf4-dca0b964bbe0)" failed - error: command '/opt/bitnami/scripts/etcd/' exited with 128: Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex...
16m Warning FailedPreStopHook pod/apisix-etcd-1 Exec lifecycle hook ([/opt/bitnami/scripts/etcd/]) for Container "etcd" in Pod "apisix-etcd-1_ingress-apisix(6fb9dc93-aabc-4f3a-99d3-6ba10bf3e040)" failed - error: command '/opt/bitnami/scripts/etcd/' exited with 128: Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex...
16m Normal Killing pod/apisix-etcd-1 Container etcd failed liveness probe, will be restarted
16m Normal Killing pod/apisix-etcd-0 Container etcd failed liveness probe, will be restarted
Pod status :
apisix-etcd-0 0/1 Running 5 (2m53s ago) 23m
apisix-etcd-1 0/1 Running 5 (2m53s ago) 23m
apisix-etcd-2 0/1 Running 5 (2m53s ago) 23m
mk describe pod/apisix-etcd-0
Name: apisix-etcd-0
Namespace: ingress-apisix
Priority: 0
Node: ip-172-31-110-110.ap-south-1.compute.internal/
Start Time: Tue, 10 Jan 2023 09:28:00 +0530
Annotations: checksum/token-secret: f0fcd4104dce3cb310d3f003076edf43dc81011716f2cdcc405202be9ceb3434 eks.privileged
Status: Running
Controlled By: StatefulSet/apisix-etcd
Container ID: docker://937cac1eaa863b75423d0f2ecf21f0e22a9dd9cbe9cd1f6ea708bda9606ade57
Image ID: docker-pullable://bitnami/etcd@sha256:2d7b831769734bb97a5c1cfd2fe46e29f422b70b5ba9f9aedfd91300839ac3ee
Ports: 2379/TCP, 2380/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Tue, 10 Jan 2023 09:52:06 +0530
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Tue, 10 Jan 2023 09:48:06 +0530
Finished: Tue, 10 Jan 2023 09:52:06 +0530
Ready: False
Restart Count: 6
Liveness: exec [/opt/bitnami/scripts/etcd/] delay=60s timeout=5s period=30s #success=1 #failure=5
Readiness: exec [/opt/bitnami/scripts/etcd/] delay=60s timeout=5s period=10s #success=1 #failure=5
MY_POD_IP: (v1:status.podIP)
MY_POD_NAME: apisix-etcd-0 (
MY_STS_NAME: apisix-etcd
ETCD_ON_K8S: yes
ETCD_DATA_DIR: /bitnami/etcd/data
ETCD_AUTH_TOKEN: jwt,priv-key=/opt/bitnami/etcd/certs/token/jwt-token.pem,sign-method=RS256,ttl=10m
ETCD_ADVERTISE_CLIENT_URLS: http://$(MY_POD_NAME).apisix-etcd-headless.ingress-apisix.svc.cluster.local:2379,http://apisix-etcd.ingress-apisix.svc.cluster.local:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS: http://$(MY_POD_NAME).apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380
ETCD_INITIAL_CLUSTER: apisix-etcd-0=http://apisix-etcd-0.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380,apisix-etcd-1=http://apisix-etcd-1.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380,apisix-etcd-2=http://apisix-etcd-2.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380
ETCD_CLUSTER_DOMAIN: apisix-etcd-headless.ingress-apisix.svc.cluster.local
/bitnami/etcd from data (rw)
/opt/bitnami/etcd/certs/token/ from etcd-jwt-token (ro)
/var/run/secrets/ from kube-api-access-h9wdm (ro)
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-apisix-etcd-0
ReadOnly: false
Type: Secret (a volume populated by a Secret)
SecretName: apisix-etcd-jwt-token
Optional: false
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: op=Exists for 300s op=Exists for 300s
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 25m default-scheduler Successfully assigned ingress-apisix/apisix-etcd-0 to ip-172-31-110-110.ap-south-1.compute.internal
Normal SuccessfulAttachVolume 24m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-6b3d1b5c-b1a6-4bc0-9ff2-32de868e4cc7"
Normal Pulled 24m kubelet Container image "" already present on machine
Normal Created 24m kubelet Created container etcd
Normal Started 24m kubelet Started container etcd
Warning Unhealthy 21m (x5 over 23m) kubelet Liveness probe failed:
Normal Killing 21m kubelet Container etcd failed liveness probe, will be restarted
Warning FailedPreStopHook 21m kubelet Exec lifecycle hook ([/opt/bitnami/scripts/etcd/]) for Container "etcd" in Pod "apisix-etcd-0_ingress-apisix(3ed8842f-ec58-47d4-b315-ce8a79328578)" failed - error: command '/opt/bitnami/scripts/etcd/' exited with 128: Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex
, message: "Error: bad member ID arg (strconv.ParseUint: parsing \"\": invalid syntax), expecting ID in Hex\n"
Warning Unhealthy 3m45s (x91 over 23m) kubelet Readiness probe failed:
How can we get the etcd cluster to work?
mk exec apisix-etcd-0 -it /bin/bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
I have no name!@apisix-etcd-0:/opt/bitnami/etcd$ etcdctl member list -w table
{"level":"warn","ts":"2023-01-10T04:25:14.832Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000362700/","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp connect: connection refused\""}
Error: context deadline exceeded
I have no name!@apisix-etcd-0:/opt/bitnami/etcd$ etcdctl endpoint status -w table --cluster
{"level":"warn","ts":"2023-01-10T04:25:27.685Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003a08c0/","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp connect: connection refused\""}
Error: failed to fetch endpoints from etcd cluster member list: context deadline exceeded
I have no name!@apisix-etcd-0:/opt/bitnami/etcd$
Based on the error message, it looks like the memberID used in the Liveness probe isn't correct.
Normal Killing 21m kubelet Container etcd failed liveness probe, will be restarted
Warning FailedPreStopHook 21m kubelet Exec lifecycle hook ([/opt/bitnami/scripts/etcd/]) for Container "etcd" in Pod "apisix-etcd-0_ingress-apisix(3ed8842f-ec58-47d4-b315-ce8a79328578)" failed - error: command '/opt/bitnami/scripts/etcd/' exited with 128: Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex
, message: "Error: bad member ID arg (strconv.ParseUint: parsing \"\": invalid syntax), expecting ID in Hex\n"
Yes, it happens automatically, and this is why the etcd crashes.