etcd
etcd copied to clipboard
The etcd(v3.5.2) process often causes the server's iowait to become high
What happened?
I deployed a k8s cluster with three master nodes and 15 worker nodes.This cluster runs about 367 nginx pods, just for testing.There are also three etcd nodes to form a cluster. etcd and apiserver are deployed on the same node.Recently, it is often found that the iowait of the master node has become high, and it can be restored to normal after restarting etcd.If I don't restart etcd, iowait will never come down. I tried to find the root cause of the problem with iostat, top, iotop, lsof, but none of them found the cause, and the problem appears irregularly.
What did you expect to happen?
Hope to find the root cause of iowait rise and fix it.
How can we reproduce it (as minimally and precisely as possible)?
My deployed environment information is as follows:
# uname -a
Linux xxx- k8s-master02 4.20.13-1.el7.elrepo.x86_64 #1 SMP Wed Feb 27 10:02:05 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)
# containerd -v
containerd containerd.io 1.5.11 3df54a852345ae127d1fa3092b95168e4a88e2f8
# etcd --version
etcd Version: 3.5.2
Git SHA: 99018a77b
Go Version: go1.16.3
Go OS/Arch: linux/amd64
# kube-apiserver --version
Kubernetes v1.22.6
# free -m
total used free shared buff/cache available
Mem: 32164 1785 28140 482 2238 29487
# cat /proc/cpuinfo | grep process
processor: 0
processor: 1
processor: 2
processor: 3
processor: 4
processor: 5
processor: 6
processor: 7
In the etcd data directory, a simple test is performed.
# dd bs=1M count=2048 if=/dev/zero of=test conv=fdatasync
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 23.5204 s, 91.3 MB/s
From the monitoring point of view this disk does not have any pressure, but the system“/”partition iops is full.
iowait becomes high from time to time, not often. But as long as it appears, the etcd process must be restarted to descend.
Anything else we need to know?
Here are some records I checked:
iostat -x 1
It can be seen that the r/s of sda is full, but the etcd data directory is /var/lib/etcd, which is a separate SSD hard disk.
iotop --only
Although kube-apiserver and controller are also high at this time, they can be restored after restarting etcd.
top

lsof, 1338 is the main process PID of etcd

Check iotop again after restarting etcd

Etcd version (please run commands below)
$ etcd --version
# paste output here
# etcd --version
etcd Version: 3.5.2
Git SHA: 99018a77b
Go Version: go1.16.3
Go OS/Arch: linux/amd64
$ etcdctl version
# paste output here
etcdctl is not installed
Etcd configuration (command line flags or environment variables)
cat /usr/lib/systemd/system/etcd.service
[Unit] Description=Etcd Service Documentation=https://coreos.com/etcd/docs/latest/ After=network.target
[Service] Type=notify ExecStart=/usr/local/bin/etcd --config-file=/etc/etcd/etcd.config.yml Restart=on-failure RestartSec=10 LimitNOFILE=65536
[Install] WantedBy=multi-user.target Alias=etcd3.service
cat /etc/etcd/etcd.config.yml
name: 'xxx-k8s-master02-168-0-22' data-dir: /var/lib/etcd wal-dir: /var/lib/etcd/wal snapshot-count: 5000 heartbeat-interval: 100 election-timeout: 1000 quota-backend-bytes: 0 listen-peer-urls: 'https://192.168.0.22:2380' listen-client-urls: 'https://192.168.0.22:2379,http://127.0.0.1:2379' max-snapshots: 3 max-wals: 5 cors: initial-advertise-peer-urls: 'https://192.168.0.22:2380' advertise-client-urls: 'https://192.168.0.22:2379' discovery: discovery-fallback: 'proxy' discovery-proxy: discovery-srv: initial-cluster: 'xxx-k8s-master01-168-0-21=https://10.97.43.21:2380,xxx-k8s-master02-168-0-22=https://192.168.0.22:2380,servicecloud-k 8s-master03-168-0-23=https://10.97.43.23:2380' initial-cluster-token: 'etcd-k8s-cluster' initial-cluster-state: 'existing' strict-reconfig-check: false enable-v2: true enable-pprof: true proxy: 'off' proxy-failure-wait: 5000 proxy-refresh-interval: 30000 proxy-dial-timeout: 1000 proxy-write-timeout: 5000 proxy-read-timeout: 0 client-transport-security: cert-file: '/etc/kubernetes/pki/etcd/etcd.pem' key-file: '/etc/kubernetes/pki/etcd/etcd-key.pem' client-cert-auth: true trusted-ca-file: '/etc/kubernetes/pki/etcd/etcd-ca.pem' auto-tls: true peer-transport-security: cert-file: '/etc/kubernetes/pki/etcd/etcd.pem' key-file: '/etc/kubernetes/pki/etcd/etcd-key.pem' peer-client-cert-auth: true trusted-ca-file: '/etc/kubernetes/pki/etcd/etcd-ca.pem' auto-tls: true debug: false log-package-levels: log-outputs: [default] force-new-cluster: false
Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)
$ etcdctl member list -w table
# paste output here
+------------------+---------+------------------------------------+--------------------------+--------------------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
+------------------+---------+------------------------------------+--------------------------+--------------------------+
| 58b6a7ae8b3ccc08 | started | xxx-k8s-master03-168-0-23 | https://192.168.0.23:2380 | https://192.168.0.23:2379 |
| a72d40042d3dcfc3 | started | xxx-k8s-master01-168-0-21 | https://192.168.0.21:2380 | https://192.168.0.21:2379 |
| fed575f46661e44c | started | xxx-k8s-master02-168-0-22 | https://192.168.0.22:2380 | https://192.168.0.22:2379 |
+------------------+---------+------------------------------------+--------------------------+--------------------------+
$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here
+------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+------------------+------------------+---------+---------+-----------+-----------+------------+
| 192.168.0.21:2379 | a72d40042d3dcfc3 | 3.5.2 | 131 MB | true | 91 | 19828175 |
| 192.168.0.22:2379 | fed575f46661e44c | 3.5.2 | 131 MB | false | 91 | 19828175 |
| 192.168.0.23:2379 | 58b6a7ae8b3ccc08 | 3.5.2 | 131 MB | false | 91 | 19828175 |
+------------------+------------------+---------+---------+-----------+-----------+------------+
Relevant log output
Intercepted part of the log
Apr 4 08:38:41 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:41.744+0800","caller":"etcdserver/v3_server.go:815","msg
":"waiting for ReadIndex response took too long, retrying","sent-request-id":16450664162270192920,"retry-timeout":"500ms"}
Apr 4 08:38:41 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:41.754+0800","caller":"etcdserver/util.go:166","msg":"ap
ply request took too long","took":"747.319986ms","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/calico/resources/v3/projectca
lico.org/nodes/xxx-k8s-master03-168-0-23\" ","response":"range_response_count:1 size:996"}
Apr 4 08:38:41 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:41.754+0800","caller":"v3rpc/interceptor.go:197","msg":"
request stats","start time":"2022-04-04T08:38:41.053+0800","time spent":"700.63208ms","remote":"192.168.0.23:33062","response type":"/etcdserverpb.KV/Txn
","request count":1,"request size":117,"response count":0,"response size":43,"request content":"compare:<target:MOD key:\"/registry/masterleases/10.97.4
3.23\" mod_revision:16993013 > success:<request_put:<key:\"/registry/masterleases/192.168.0.23\" value_size:67 lease:5747578185647075638 >> failure:<requ
est_range:<key:\"/registry/masterleases/192.168.0.23\" > >"}
Apr 4 08:38:41 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:41.754+0800","caller":"etcdserver/util.go:166","msg":"ap
ply request took too long","took":"699.269271ms","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/masterleases/\" rang
e_end:\"/registry/masterleases0\" ","response":"range_response_count:3 size:399"}
Apr 4 08:38:41 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:41.754+0800","caller":"v3rpc/interceptor.go:197","msg":"
request stats","start time":"2022-04-04T08:38:41.077+0800","time spent":"676.532394ms","remote":"192.168.0.21:41074","response type":"/etcdserverpb.KV/Tx
n","request count":1,"request size":520,"response count":0,"response size":43,"request content":"compare:<target:MOD key:\"/registry/leases/kube-system/
kube-controller-manager\" mod_revision:16993058 > success:<request_put:<key:\"/registry/leases/kube-system/kube-controller-manager\" value_size:460 >> f
ailure:<request_range:<key:\"/registry/leases/kube-system/kube-controller-manager\" > >"}
Apr 4 08:38:42 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:42.255+0800","caller":"etcdserver/v3_server.go:815","msg
":"waiting for ReadIndex response took too long, retrying","sent-request-id":16450664162270192921,"retry-timeout":"500ms"}
Apr 4 08:38:42 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:42.755+0800","caller":"etcdserver/v3_server.go:815","msg
":"waiting for ReadIndex response took too long, retrying","sent-request-id":16450664162270192921,"retry-timeout":"500ms"}
Apr 4 08:38:42 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:42.828+0800","caller":"etcdserver/v3_server.go:792","msg
":"ignored out-of-date read index response; local node read indexes queueing up and waiting to be in sync with leader","sent-request-id":164506641622701
92921,"received-request-id":16450664162270192920}
Apr 4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:43.255+0800","caller":"etcdserver/v3_server.go:815","msg
":"waiting for ReadIndex response took too long, retrying","sent-request-id":16450664162270192921,"retry-timeout":"500ms"}
Apr 4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"info","ts":"2022-04-04T08:38:43.371+0800","caller":"traceutil/trace.go:171","msg":"tr
ace[51766535] linearizableReadLoop","detail":"{readStateIndex:19779122; appliedIndex:19779122; }","duration":"1.617109134s","start":"2022-04-04T08:38:41
.754+0800","end":"2022-04-04T08:38:43.371+0800","steps":["trace[51766535] 'read index received' (duration: 1.61709629s)","trace[51766535] 'applied inde
x is now lower than readState.Index' (duration: 11.126µs)"],"step_count":2}
Apr 4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:43.371+0800","caller":"etcdserver/util.go:166","msg":"ap
ply request took too long","took":"2.036405202s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/management.cattle.io
/tokens/\" range_end:\"/registry/management.cattle.io/tokens0\" count_only:true ","response":"range_response_count:0 size:7"}
Apr 4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"info","ts":"2022-04-04T08:38:43.371+0800","caller":"traceutil/trace.go:171","msg":"tr
ace[250405174] range","detail":"{range_begin:/registry/management.cattle.io/tokens/; range_end:/registry/management.cattle.io/tokens0; response_count:0;
response_revision:16993077; }","duration":"2.036476756s","start":"2022-04-04T08:38:41.334+0800","end":"2022-04-04T08:38:43.371+0800","steps":["trace[25
0405174] 'agreement among raft nodes before linearized reading' (duration: 2.036372691s)"],"step_count":1}
Apr 4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:43.371+0800","caller":"etcdserver/util.go:166","msg":"ap
ply request took too long","took":"2.05153119s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/prioritylevelconfigur
ations/node-high\" ","response":"range_response_count:1 size:696"}
Apr 4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"info","ts":"2022-04-04T08:38:43.371+0800","caller":"traceutil/trace.go:171","msg":"tr
ace[1123299582] range","detail":"{range_begin:/registry/prioritylevelconfigurations/node-high; range_end:; response_count:1; response_revision:16993077;
}","duration":"2.051557477s","start":"2022-04-04T08:38:41.319+0800","end":"2022-04-04T08:38:43.371+0800","steps":["trace[1123299582] 'agreement among r
aft nodes before linearized reading' (duration: 2.05149392s)"],"step_count":1}
Apr 4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:43.858+0800","caller":"etcdserver/util.go:166","msg":"ap
ply request took too long","took":"490.353508ms","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/management.cattle.io
/groupmembers/\" range_end:\"/registry/management.cattle.io/groupmembers0\" count_only:true ","response":"range_response_count:0 size:7"}
Apr 4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"info","ts":"2022-04-04T08:38:43.858+0800","caller":"traceutil/trace.go:171","msg":"tr
ace[346636311] range","detail":"{range_begin:/registry/management.cattle.io/groupmembers/; range_end:/registry/management.cattle.io/groupmembers0; respo
nse_count:0; response_revision:16993083; }","duration":"490.375769ms","start":"2022-04-04T08:38:43.368+0800","end":"2022-04-04T08:38:43.858+0800","steps
":["trace[346636311] 'agreement among raft nodes before linearized reading' (duration: 490.341115ms)"],"step_count":1}
Apr 4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:43.858+0800","caller":"v3rpc/interceptor.go:197","msg":"
request stats","start time":"2022-04-04T08:38:42.991+0800","time spent":"867.248022ms","remote":"192.168.0.21:41106","response type":"/etcdserverpb.KV/Ra
nge","request count":0,"request size":42,"response count":0,"response size":30,"request content":"key:\"/registry/ingress/\" range_end:\"/registry/ingre
ss0\" count_only:true "}
Apr 4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:43.858+0800","caller":"etcdserver/util.go:166","msg":"ap
ply request took too long","took":"2.102507536s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/endpointslices/defau
lt/kubernetes\" ","response":"range_response_count:1 size:524"}
Apr 4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:43.858+0800","caller":"v3rpc/interceptor.go:197","msg":"
request stats","start time":"2022-04-04T08:38:43.368+0800","time spent":"490.412641ms","remote":"192.168.0.22:49194","response type":"/etcdserverpb.KV/Ra
nge","request count":0,"request size":94,"response count":0,"response size":30,"request content":"key:\"/registry/management.cattle.io/groupmembers/\" r
ange_end:\"/registry/management.cattle.io/groupmembers0\" count_only:true "}
Apr 4 08:39:10 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:39:10.989+0800","caller":"v3rpc/interceptor.go:197","msg":"
request stats","start time":"2022-04-04T08:39:09.808+0800","time spent":"1.180906139s","remote":"192.168.0.35:42840","response type":"/etcdserverpb.KV/Ra
nge","request count":0,"request size":81,"response count":1,"response size":1019,"request content":"key:\"/calico/resources/v3/projectcalico.org/nodes/s
ervicecloud-k8s-master03-168-0-23\" "}
Apr 4 08:39:11 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:39:11.360+0800","caller":"etcdserver/v3_server.go:792","msg
":"ignored out-of-date read index response; local node read indexes queueing up and waiting to be in sync with leader","sent-request-id":164506641622701
93073,"received-request-id":16450664162270193066}
Apr 4 08:43:05 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:43:05.776+0800","caller":"etcdserver/util.go:166","msg":"ap
ply request took too long","took":"18.702242124s","expected-duration":"100ms","prefix":"","request":"header:<ID:14702141606942672209 username:\"etcd\" a
uth_revision:1 > txn:<compare:<target:MOD key:\"/registry/minions/xxx-k8s-node08-168-0-31\" mod_revision:16993950 > success:<request_put:<key:\
"/registry/minions/xxx-k8s-node08-168-0-31\" value_size:4861 >> failure:<>>","response":"size:20"}
I'm having trouble understanding the full setup. Is the etcd dataDir on dm-1 ?
I don't see how etcd could cause kube-scheduler to kube-controller to read X MB/s from disk...
Aren't these components in a restart loop loading their binaries ? Could you, please, check what they are doing using strace ?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.
@dotbalo have you found the workaround on this?
@MohamedShaj It seems that this is not a problem of etcd, but a problem of disk performance. We have replaced the disks with SSDs of the highest performance and increased the storage space, and then run all the processes on the high-performance disks. Since then, iowait has rarely been high.
so after changing the disk with high I/O also, sometimes you see these logs right,
I was facing this logs issue in k3s cluster, I have 3 node clusters where the master and worker are in the same node
in my case, i came to know that etcd data dir should be a separate disk and needs to have a good IO disk and better-performing CPU and memory,
so after all tuning i can see the logs were drastically reduced,for 4 hours once I can see only 2 logs or 3 related readIndex took long time logs,
So just want to understand, some or other reason the logs will be coming right but this will not our node unstable?