etcd
etcd copied to clipboard
Scheduled compaction is not started on one of the pod in etcd cluster
What happened?
After ETCD upgrade , in logs of one of the 3 node pod , we could see that Scheduled compaction was not started . We could see Scheduled compaction logs are printing 2 of the pods but in one of the pod(i.e. pod-2) not printing Scheduled compaction logs. This is impacting imbalance of revision count among the pods.
What did you expect to happen?
Revision number on all pods should be same , they should not get impacted
How can we reproduce it (as minimally and precisely as possible)?
Its reproducible rarely.
Anything else we need to know?
No response
Etcd version (please run commands below)
bash-4.4$ etcd --version etcd Version: 3.3.11 Git SHA: 2cf9e51d2 Go Version: go1.10.7 Go OS/Arch: linux/amd64 bash-4.4$ etcdctl version etcdctl version: 3.3.11 API version: 3.3 bash-4.4$
Etcd configuration (command line flags or environment variables)
VALID_PARAMETERS=valid ETCD_INITIAL_CLUSTER_TOKEN=dced ETCD_MAX_SNAPSHOTS=3 TZ=UTC HOSTNAME=dced-0 COMPONENT_VERSION=v3.3.11 ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379 ETCD_HEARTBEAT_INTERVAL=100 ETCD_AUTO_COMPACTION_RETENTION=100 DISARM_ALARM_PEER_INTERVAL=6 ETCD_TRUSTED_CA_FILE=/data/combinedca/cacertbundle.pem MONITOR_ALARM_INTERVAL=5 MS_SEC_KEY_MANAGEMENT_SERVICE_HOST=10.102.45.131 MS_SEC_KEY_MANAGEMENT_PORT_8200_TCP_PROTO=tcp KUBERNETES_PORT_443_TCP_PROTO=tcp KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1 ETCDCTL_CERT=/run/sec/certs/client/clicert.pem DEFRAGMENT_ENABLE=true MS_DATA_DISTRIBUTED_COORDINATOR_ED_SERVICE_HOST=10.104.171.77 KUBERNETES_PORT=tcp://10.96.0.1:443 MS_DATA_DISTRIBUTED_COORDINATOR_ED_SERVICE_PORT=2379 PWD=/ ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380 HOME=/home/dced MS_DATA_DISTRIBUTED_COORDINATOR_ED_SERVICE_PORT_CLIENT_PORT_TLS=2379 ETCD_AUTO_COMPACTION_MODE=revision KUBERNETES_SERVICE_PORT_HTTPS=443 MS_DATA_DISTRIBUTED_COORDINATOR_ED_PORT_2379_TCP_ADDR=10.104.171.77 KUBERNETES_PORT_443_TCP_PORT=443 ETCD_DEBUG=false MS_SEC_KEY_MANAGEMENT_SERVICE_PORT_HTTPS_KMS=8200 ETCD_CERT_FILE=/run/sec/certs/server/srvcert.pem ETCD_FIFO_DIR=/fifo ETCD_PEER_AUTO_TLS=true MS_DATA_DISTRIBUTED_COORDINATOR_ED_PORT_2379_TCP_PORT=2379 KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443 MS_DATA_DISTRIBUTED_COORDINATOR_ED_PORT_2379_TCP=tcp://10.104.171.77:2379 DEFRAGMENT_PERIODIC_INTERVAL=60 COMPONENT=etcd ETCD_DATA_DIR=/data ETCD_LOG_PACKAGE_LEVELS=etcdserver=INFO,security=INFO ETCD_CLIENT_CERT_AUTH=true TERM=xterm MS_SEC_KEY_MANAGEMENT_PORT=tcp://10.102.45.131:8200 ETCDCTL_ENDPOINTS=dced.zmorrah:2379 ETCD_METRICS=basic ETCDCTL_API=3 MS_DATA_DISTRIBUTED_COORDINATOR_ED_PORT=tcp://10.104.171.77:2379 ETCD_SNAPSHOT_COUNT=5000 ETCD_MAX_WALS=3 SHLVL=1 MS_SEC_KEY_MANAGEMENT_PORT_8200_TCP_ADDR=10.102.45.131 KUBERNETES_SERVICE_PORT=443 ETCD_INITIAL_ADVERTISE_PEER_URLS=https://dced-0.dced-peer.zmorrah.svc.cluster.local:2380 ETCD_KEY_FILE=/run/sec/certs/server/srvprivkey.pem ETCD_ENABLE_V2=false ETCD_ELECTION_TIMEOUT=1000 ETCDCTL_CACERT=/data/combinedca/cacertbundle.pem ETCD_NAME=dced-0 ETCD_QUOTA_BACKEND_BYTES=268435456 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin ETCD_ADVERTISE_CLIENT_URLS=https://dced-0.dced.zmorrah:2379 MS_SEC_KEY_MANAGEMENT_SERVICE_PORT=8200 KUBERNETES_SERVICE_HOST=10.96.0.1 FLAVOUR=etcd-v3.3.11-linux-amd64 MS_SEC_KEY_MANAGEMENT_PORT_8200_TCP=tcp://10.102.45.131:8200 MS_DATA_DISTRIBUTED_COORDINATOR_ED_PORT_2379_TCP_PROTO=tcp MS_SEC_KEY_MANAGEMENT_PORT_8200_TCP_PORT=8200 ETCDCTL_KEY=/run/sec/certs/client/cliprivkey.pem _=/usr/bin/env
Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)
$ etcdctl member list -w table
# paste output here
$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here
Relevant log output
You can refer to following log lines which shows ""finished scheduled compaction"" this log lines is coming only in pod 0 and pod 1 .
Its not coming in pod2
Hi @ahrtr , Could you please help us with this ??
Thanks, Rahul
3.3.* is end of life. Can you try this in 3.5.*?
Similar to this reply https://github.com/etcd-io/etcd/issues/13918#issuecomment-1096845232
Hi @lavacat , Could you please confirm whether our issue and the issue you mentioned in #13918 are same/similar?
Thanks, Rahul
Hi @lavacat , Could you please confirm above comment?
Thanks
I've mentioned https://github.com/etcd-io/etcd/issues/13918#issuecomment-1096845232 as an example of end of life comment. The issue isn't related. Maybe I should find a better reference in the docs.
Hi @lavacat, We have one more query, #11817 this ticket tells about one fix done in etcd regrading deadlock bug. And we suspect that we are seeing our issue(of compaction not done in one of pod) because of deadlock condition bug which is solved in https://github.com/etcd-io/etcd/pull/11817. Could you confirm that our issue is coming because of https://github.com/etcd-io/etcd/pull/11817 3pp ticket?
Note - We are having 3 node etcd cluster and we are using ETCD 3.3.11. And we havent seen this compaction issue on ETCD 3.4.16
Thanks, Rahul
Hi @lavacat , Any confirmation?
Thanks, Rahul
Hi @lavacat , Could you please confirm?
Thanks
Hi @lavacat , Any updates?
Thanks
Hi @lavacat @ahrtr , Any updates on above query?
Hi @lavacat @ahrtr , Any updates on above query?
Hi @lavacat @ahrtr , Any updates on above query?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.