etcd icon indicating copy to clipboard operation
etcd copied to clipboard

etcd leader change leads to "timed out waiting for read index response"

Open lance5890 opened this issue 2 years ago • 6 comments

What happened?

shutdown one member's network(mock network failure), and then up this memer, as this member's term id is greater than the existed cluster's termid, this will leads to leader re-election, but we find when the cluster has elected a new leader, but the request during the time has "timed out waiting for read index response (local node might have slow network)" warning

What did you expect to happen?

we find the follower returns faster than the master reply; and the master can reach sometimes reach timeout

we just wonder why the master is slower than the follower reply

How can we reproduce it (as minimally and precisely as possible)?

shudown one follower's network; and sleep 30s; and ifup this follower's network

Anything else we need to know?

HELP etcd_disk_backend_commit_duration_seconds The latency distributions of commit called by backend.

TYPE etcd_disk_backend_commit_duration_seconds histogram

etcd_disk_backend_commit_duration_seconds_bucket{le="0.001"} 125143 etcd_disk_backend_commit_duration_seconds_bucket{le="0.002"} 133860 etcd_disk_backend_commit_duration_seconds_bucket{le="0.004"} 136214 etcd_disk_backend_commit_duration_seconds_bucket{le="0.008"} 139231 etcd_disk_backend_commit_duration_seconds_bucket{le="0.016"} 144446 etcd_disk_backend_commit_duration_seconds_bucket{le="0.032"} 150972 etcd_disk_backend_commit_duration_seconds_bucket{le="0.064"} 153590 etcd_disk_backend_commit_duration_seconds_bucket{le="0.128"} 153593 etcd_disk_backend_commit_duration_seconds_bucket{le="0.256"} 153593 etcd_disk_backend_commit_duration_seconds_bucket{le="0.512"} 153593 etcd_disk_backend_commit_duration_seconds_bucket{le="1.024"} 153593 etcd_disk_backend_commit_duration_seconds_bucket{le="2.048"} 153593 etcd_disk_backend_commit_duration_seconds_bucket{le="4.096"} 153593 etcd_disk_backend_commit_duration_seconds_bucket{le="8.192"} 153593 etcd_disk_backend_commit_duration_seconds_bucket{le="+Inf"} 153593 etcd_disk_backend_commit_duration_seconds_sum 421.5890662139977 etcd_disk_backend_commit_duration_seconds_count 153593

HELP etcd_disk_wal_fsync_duration_seconds The latency distributions of fsync called by WAL.

TYPE etcd_disk_wal_fsync_duration_seconds histogram

etcd_disk_wal_fsync_duration_seconds_bucket{le="0.001"} 229332 etcd_disk_wal_fsync_duration_seconds_bucket{le="0.002"} 256695 etcd_disk_wal_fsync_duration_seconds_bucket{le="0.004"} 262156 etcd_disk_wal_fsync_duration_seconds_bucket{le="0.008"} 266973 etcd_disk_wal_fsync_duration_seconds_bucket{le="0.016"} 274651 etcd_disk_wal_fsync_duration_seconds_bucket{le="0.032"} 280611 etcd_disk_wal_fsync_duration_seconds_bucket{le="0.064"} 281267 etcd_disk_wal_fsync_duration_seconds_bucket{le="0.128"} 281275 etcd_disk_wal_fsync_duration_seconds_bucket{le="0.256"} 281275 etcd_disk_wal_fsync_duration_seconds_bucket{le="0.512"} 281275 etcd_disk_wal_fsync_duration_seconds_bucket{le="1.024"} 281275 etcd_disk_wal_fsync_duration_seconds_bucket{le="2.048"} 281275 etcd_disk_wal_fsync_duration_seconds_bucket{le="4.096"} 281275 etcd_disk_wal_fsync_duration_seconds_bucket{le="8.192"} 281275 etcd_disk_wal_fsync_duration_seconds_bucket{le="+Inf"} 281275 etcd_disk_wal_fsync_duration_seconds_sum 391.62205426201274

TYPE etcd_network_peer_round_trip_time_seconds histogram

etcd_network_peer_round_trip_time_seconds_bucket{To="37c54505d87a4118",le="0.0001"} 0 etcd_network_peer_round_trip_time_seconds_bucket{To="37c54505d87a4118",le="0.0002"} 0 etcd_network_peer_round_trip_time_seconds_bucket{To="37c54505d87a4118",le="0.0004"} 0 etcd_network_peer_round_trip_time_seconds_bucket{To="37c54505d87a4118",le="0.0008"} 912 etcd_network_peer_round_trip_time_seconds_bucket{To="37c54505d87a4118",le="0.0016"} 1902 etcd_network_peer_round_trip_time_seconds_bucket{To="37c54505d87a4118",le="0.0032"} 2354 etcd_network_peer_round_trip_time_seconds_bucket{To="37c54505d87a4118",le="0.0064"} 2438 etcd_network_peer_round_trip_time_seconds_bucket{To="37c54505d87a4118",le="0.0128"} 2804 etcd_network_peer_round_trip_time_seconds_bucket{To="37c54505d87a4118",le="0.0256"} 5060 etcd_network_peer_round_trip_time_seconds_bucket{To="37c54505d87a4118",le="0.0512"} 5069 etcd_network_peer_round_trip_time_seconds_bucket{To="37c54505d87a4118",le="0.1024"} 5078 etcd_network_peer_round_trip_time_seconds_bucket{To="37c54505d87a4118",le="0.2048"} 5081 etcd_network_peer_round_trip_time_seconds_bucket{To="37c54505d87a4118",le="0.4096"} 5087 etcd_network_peer_round_trip_time_seconds_bucket{To="37c54505d87a4118",le="0.8192"} 5089 etcd_network_peer_round_trip_time_seconds_bucket{To="37c54505d87a4118",le="1.6384"} 5091 etcd_network_peer_round_trip_time_seconds_bucket{To="37c54505d87a4118",le="3.2768"} 5091 etcd_network_peer_round_trip_time_seconds_bucket{To="37c54505d87a4118",le="+Inf"} 5091 etcd_network_peer_round_trip_time_seconds_sum{To="37c54505d87a4118"} 52.09408713200003 etcd_network_peer_round_trip_time_seconds_count{To="37c54505d87a4118"} 5091 etcd_network_peer_round_trip_time_seconds_bucket{To="6fb90e7a505dc03f",le="0.0001"} 0 etcd_network_peer_round_trip_time_seconds_bucket{To="6fb90e7a505dc03f",le="0.0002"} 0 etcd_network_peer_round_trip_time_seconds_bucket{To="6fb90e7a505dc03f",le="0.0004"} 0 etcd_network_peer_round_trip_time_seconds_bucket{To="6fb90e7a505dc03f",le="0.0008"} 68 etcd_network_peer_round_trip_time_seconds_bucket{To="6fb90e7a505dc03f",le="0.0016"} 155 etcd_network_peer_round_trip_time_seconds_bucket{To="6fb90e7a505dc03f",le="0.0032"} 329 etcd_network_peer_round_trip_time_seconds_bucket{To="6fb90e7a505dc03f",le="0.0064"} 811 etcd_network_peer_round_trip_time_seconds_bucket{To="6fb90e7a505dc03f",le="0.0128"} 2231 etcd_network_peer_round_trip_time_seconds_bucket{To="6fb90e7a505dc03f",le="0.0256"} 4022 etcd_network_peer_round_trip_time_seconds_bucket{To="6fb90e7a505dc03f",le="0.0512"} 4227 etcd_network_peer_round_trip_time_seconds_bucket{To="6fb90e7a505dc03f",le="0.1024"} 4230 etcd_network_peer_round_trip_time_seconds_bucket{To="6fb90e7a505dc03f",le="0.2048"} 4235 etcd_network_peer_round_trip_time_seconds_bucket{To="6fb90e7a505dc03f",le="0.4096"} 4241 etcd_network_peer_round_trip_time_seconds_bucket{To="6fb90e7a505dc03f",le="0.8192"} 4242 etcd_network_peer_round_trip_time_seconds_bucket{To="6fb90e7a505dc03f",le="1.6384"} 4243 etcd_network_peer_round_trip_time_seconds_bucket{To="6fb90e7a505dc03f",le="3.2768"} 4245 etcd_network_peer_round_trip_time_seconds_bucket{To="6fb90e7a505dc03f",le="+Inf"} 4245 etcd_network_peer_round_trip_time_seconds_sum{To="6fb90e7a505dc03f"} 65.73302525599975 etcd_network_peer_round_trip_time_seconds_count{To="6fb90e7a505dc03f"} 4245

Etcd version (please run commands below)

$ etcd --version
# paste output here
3.4.13
$ etcdctl version
# paste output here

3.4.13

Etcd configuration (command line flags or environment variables)

paste your configuration here

--election-timeout=1000 --snapshot-count=10000 --heartbeat-interval=100

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

the master log:
{"level":"info","ts":"2022-05-10T04:41:46.655Z","caller":"rafthttp/stream.go:250","msg":"set message encoder","from":"89cb9ab87293d525","to":"89cb9ab87293d525","stream-type":"stream Message"}
{"level":"warn","ts":"2022-05-10T04:41:46.655Z","caller":"rafthttp/stream.go:266","msg":"closed TCP streaming connection with remote peer","stream-writer-type":"stream Message","local-member-id":"89cb9ab87293d525","remote-peer-id":"6fb90e7a505dc03f"}
{"level":"warn","ts":"2022-05-10T04:41:46.655Z","caller":"rafthttp/stream.go:277","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream Message","local-member-id":"89cb9ab87293d525","remote-peer-id":"6fb90e7a505dc03f"}
{"level":"info","ts":"2022-05-10T04:41:46.744Z","caller":"raft/raft.go:859","msg":"89cb9ab87293d525 [term: 2793] received a MsgAppResp message with higher term from 6fb90e7a505dc03f [term: 2815]"}
{"level":"info","ts":"2022-05-10T04:41:46.744Z","caller":"raft/raft.go:700","msg":"89cb9ab87293d525 became follower at term 2815"}
{"level":"info","ts":"2022-05-10T04:41:46.744Z","caller":"raft/node.go:331","msg":"raft.node: 89cb9ab87293d525 lost leader 89cb9ab87293d525 at term 2815"}
{"level":"info","ts":"2022-05-10T04:41:46.782Z","caller":"raft/raft.go:1365","msg":"89cb9ab87293d525 no leader at term 2815; dropping index reading msg"}
{"level":"info","ts":"2022-05-10T04:41:46.975Z","caller":"raft/raft.go:859","msg":"89cb9ab87293d525 [term: 2815] received a MsgVote message with higher term from 6fb90e7a505dc03f [term: 2816]"}
{"level":"info","ts":"2022-05-10T04:41:46.975Z","caller":"raft/raft.go:700","msg":"89cb9ab87293d525 became follower at term 2816"}
{"level":"info","ts":"2022-05-10T04:41:46.975Z","caller":"raft/raft.go:978","msg":"89cb9ab87293d525 [logterm: 2793, index: 1704243, vote: 0] rejected MsgVote from 6fb90e7a505dc03f [logterm: 2793, index: 1704132] at term 2816"}
{"level":"warn","ts":"2022-05-10T04:41:47.085Z","caller":"rafthttp/peer_status.go:68","msg":"peer became inactive (message send to peer failed)","peer-id":"6fb90e7a505dc03f","error":"failed to write 6fb90e7a505dc03f on stream MsgApp v2 (write tcp 10.255.148.129:2380->10.255.148.128:59972: i/o timeout)"}
{"level":"warn","ts":"2022-05-10T04:41:47.086Z","caller":"rafthttp/stream.go:224","msg":"lost TCP streaming connection with remote peer","stream-writer-type":"stream MsgApp v2","local-member-id":"89cb9ab87293d525","remote-peer-id":"6fb90e7a505dc03f"}
{"level":"info","ts":"2022-05-10T04:41:47.087Z","caller":"rafthttp/stream.go:250","msg":"set message encoder","from":"89cb9ab87293d525","to":"89cb9ab87293d525","stream-type":"stream MsgApp v2"}
{"level":"info","ts":"2022-05-10T04:41:47.087Z","caller":"rafthttp/peer_status.go:51","msg":"peer became active","peer-id":"6fb90e7a505dc03f"}
{"level":"warn","ts":"2022-05-10T04:41:47.087Z","caller":"rafthttp/stream.go:277","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream MsgApp v2","local-member-id":"89cb9ab87293d525","remote-peer-id":"6fb90e7a505dc03f"}
{"level":"info","ts":"2022-05-10T04:41:47.772Z","caller":"raft/raft.go:1365","msg":"89cb9ab87293d525 no leader at term 2816; dropping index reading msg"}
{"level":"info","ts":"2022-05-10T04:41:47.836Z","caller":"raft/raft.go:901","msg":"89cb9ab87293d525 [term: 2816] ignored a MsgVote message with lower term from 37c54505d87a4118 [term: 2794]"}
{"level":"info","ts":"2022-05-10T04:41:48.243Z","caller":"raft/raft.go:923","msg":"89cb9ab87293d525 is starting a new election at term 2816"}
{"level":"info","ts":"2022-05-10T04:41:48.244Z","caller":"raft/raft.go:713","msg":"89cb9ab87293d525 became candidate at term 2817"}
{"level":"info","ts":"2022-05-10T04:41:48.244Z","caller":"raft/raft.go:824","msg":"89cb9ab87293d525 received MsgVoteResp from 89cb9ab87293d525 at term 2817"}
{"level":"info","ts":"2022-05-10T04:41:48.244Z","caller":"raft/raft.go:811","msg":"89cb9ab87293d525 [logterm: 2793, index: 1704243] sent MsgVote request to 37c54505d87a4118 at term 2817"}
{"level":"info","ts":"2022-05-10T04:41:48.244Z","caller":"raft/raft.go:811","msg":"89cb9ab87293d525 [logterm: 2793, index: 1704243] sent MsgVote request to 6fb90e7a505dc03f at term 2817"}
{"level":"info","ts":"2022-05-10T04:41:48.245Z","caller":"raft/raft.go:824","msg":"89cb9ab87293d525 received MsgVoteResp from 37c54505d87a4118 at term 2817"}
{"level":"info","ts":"2022-05-10T04:41:48.245Z","caller":"raft/raft.go:1302","msg":"89cb9ab87293d525 has received 2 MsgVoteResp votes and 0 vote rejections"}
{"level":"info","ts":"2022-05-10T04:41:48.245Z","caller":"raft/raft.go:765","msg":"89cb9ab87293d525 became leader at term 2817"}
{"level":"info","ts":"2022-05-10T04:41:48.245Z","caller":"raft/node.go:325","msg":"raft.node: 89cb9ab87293d525 elected leader 89cb9ab87293d525 at term 2817"}
{"level":"warn","ts":"2022-05-10T04:41:48.245Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"473.354607ms","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/auditsinks\" range_end:\"/registry/auditsinkt\" count_only:true ","response":"","error":"etcdserver: leader changed"}
{"level":"info","ts":"2022-05-10T04:41:48.245Z","caller":"traceutil/trace.go:145","msg":"trace[226919836] range","detail":"{range_begin:/registry/auditsinks; range_end:/registry/auditsinkt; }","duration":"473.430293ms","start":"2022-05-10T04:41:47.772Z","end":"2022-05-10T04:41:48.245Z","steps":["trace[226919836] 'agreement among raft nodes before linearized reading'  (duration: 473.330205ms)"]}
{"level":"warn","ts":"2022-05-10T04:41:55.245Z","caller":"etcdserver/v3_server.go:746","msg":"timed out waiting for read index response (local node might have slow network)","timeout":"7s"}
{"level":"warn","ts":"2022-05-10T04:41:55.245Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"7.066584123s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/services/endpoints/kube-system/kube-controller-manager\" ","response":"","error":"etcdserver: request timed out"}
{"level":"info","ts":"2022-05-10T04:41:55.246Z","caller":"traceutil/trace.go:145","msg":"trace[2110166667] range","detail":"{range_begin:/registry/services/endpoints/kube-system/kube-controller-manager; range_end:; }","duration":"7.06662378s","start":"2022-05-10T04:41:48.179Z","end":"2022-05-10T04:41:55.246Z","steps":["trace[2110166667] 'agreement among raft nodes before linearized reading'  (duration: 7.066557345s)"]}
{"level":"warn","ts":"2022-05-10T04:41:55.246Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"7.284376215s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/namespaces/default\" ","response":"","error":"etcdserver: request timed out"}
{"level":"info","ts":"2022-05-10T04:41:55.246Z","caller":"traceutil/trace.go:145","msg":"trace[1819864903] range","detail":"{range_begin:/registry/namespaces/default; range_end:; }","duration":"7.284388674s","start":"2022-05-10T04:41:47.961Z","end":"2022-05-10T04:41:55.246Z","steps":["trace[1819864903] 'agreement among raft nodes before linearized reading'  (duration: 7.284368928s)"]}
{"level":"warn","ts":"2022-05-10T04:41:55.246Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"3.636105478s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/leases\" range_end:\"/registry/leaset\" count_only:true ","response":"range_response_count:0 size:8"}
{"level":"info","ts":"2022-05-10T04:41:55.246Z","caller":"traceutil/trace.go:145","msg":"trace[1620812220] range","detail":"{range_begin:/registry/leases; range_end:/registry/leaset; response_count:0; response_revision:1370553; }","duration":"3.636146515s","start":"2022-05-10T04:41:51.610Z","end":"2022-05-10T04:41:55.246Z","steps":["trace[1620812220] 'agreement among raft nodes before linearized reading'  (duration: 3.636025555s)"]}
{"level":"warn","ts":"2022-05-10T04:41:55.246Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"6.75466347s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/namespaces/kube-system\" ","response":"range_response_count:1 size:264"}
{"level":"info","ts":"2022-05-10T04:41:55.246Z","caller":"traceutil/trace.go:145","msg":"trace[1167100781] range","detail":"{range_begin:/registry/namespaces/kube-system; range_end:; response_count:1; response_revision:1370553; }","duration":"6.754715163s","start":"2022-05-10T04:41:48.491Z","end":"2022-05-10T04:41:55.246Z","steps":["trace[1167100781] 'agreement among raft nodes before linearized reading'  (duration: 6.754551591s)"]}
{"level":"warn","ts":"2022-05-10T04:41:55.246Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"4.66992615s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/configmaps/ingress-nginx/ingress-controller-leader-nginx\" ","response":"range_response_count:1 size:605"}
{"level":"info","ts":"2022-05-10T04:41:55.246Z","caller":"traceutil/trace.go:145","msg":"trace[86114651] range","detail":"{range_begin:/registry/configmaps/ingress-nginx/ingress-controller-leader-nginx; range_end:; response_count:1; response_revision:1370553; }","duration":"4.66994986s","start":"2022-05-10T04:41:50.576Z","end":"2022-05-10T04:41:55.246Z","steps":["trace[86114651] 'agreement among raft nodes before linearized reading'  (duration: 4.669883542s)"]}
{"level":"warn","ts":"2022-05-10T04:41:55.246Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"520.1425ms","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/services/specs\" range_end:\"/registry/services/spect\" count_only:true ","response":"range_response_count:0 size:8"}
{"level":"info","ts":"2022-05-10T04:41:55.246Z","caller":"traceutil/trace.go:145","msg":"trace[669915318] range","detail":"{range_begin:/registry/services/specs; range_end:/registry/services/spect; response_count:0; response_revision:1370553; }","duration":"520.162191ms","start":"2022-05-10T04:41:54.726Z","end":"2022-05-10T04:41:55.246Z","steps":["trace[669915318] 'agreement among raft nodes before linearized reading'  (duration: 520.095541ms)"]}
{"level":"warn","ts":"2022-05-10T04:41:55.246Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"5.477321397s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/crd.projectcalico.org/clusterinformations\" range_end:\"/registry/crd.projectcalico.org/clusterinformationt\" count_only:true ","response":"range_response_count:0 size:8"}
{"level":"warn","ts":"2022-05-10T04:41:55.246Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"1.104159696s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/health\" ","response":"range_response_count:0 size:6"}
{"level":"info","ts":"2022-05-10T04:41:55.247Z","caller":"traceutil/trace.go:145","msg":"trace[782354871] range","detail":"{range_begin:/registry/health; range_end:; response_count:0; response_revision:1370553; }","duration":"1.104186832s","start":"2022-05-10T04:41:54.142Z","end":"2022-05-10T04:41:55.247Z","steps":["trace[782354871] 'agreement among raft nodes before linearized reading'  (duration: 1.104118795s)"]}
{"level":"warn","ts":"2022-05-10T04:41:55.247Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"2.234216129s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/limitranges\" range_end:\"/registry/limitranget\" count_only:true ","response":"range_response_count:0 size:6"}
{"level":"info","ts":"2022-05-10T04:41:55.247Z","caller":"traceutil/trace.go:145","msg":"trace[1751856174] range","detail":"{range_begin:/registry/limitranges; range_end:/registry/limitranget; response_count:0; response_revision:1370553; }","duration":"2.234234385s","start":"2022-05-10T04:41:53.012Z","end":"2022-05-10T04:41:55.247Z","steps":["trace[1751856174] 'agreement among raft nodes before linearized reading'  (duration: 2.234182094s)"]}
{"level":"warn","ts":"2022-05-10T04:41:55.247Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"6.536714905s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/rolebindings\" range_end:\"/registry/rolebindingt\" count_only:true ","response":"range_response_count:0 size:8"}

the follower log:
{"level":"warn","ts":"2022-05-10T04:41:44.013Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"6fb90e7a505dc03f","rtt":"9.664886ms","error":"dial tcp 10.255.148.128:2380: i/o timeout"}
{"level":"info","ts":"2022-05-10T04:41:44.036Z","caller":"rafthttp/stream.go:250","msg":"set message encoder","from":"37c54505d87a4118","to":"37c54505d87a4118","stream-type":"stream Message"}
{"level":"info","ts":"2022-05-10T04:41:44.036Z","caller":"rafthttp/peer_status.go:51","msg":"peer became active","peer-id":"6fb90e7a505dc03f"}
{"level":"warn","ts":"2022-05-10T04:41:44.036Z","caller":"rafthttp/stream.go:266","msg":"closed TCP streaming connection with remote peer","stream-writer-type":"stream Message","local-member-id":"37c54505d87a4118","remote-peer-id":"6fb90e7a505dc03f"}
{"level":"warn","ts":"2022-05-10T04:41:44.036Z","caller":"rafthttp/stream.go:277","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream Message","local-member-id":"37c54505d87a4118","remote-peer-id":"6fb90e7a505dc03f"}
{"level":"info","ts":"2022-05-10T04:41:44.036Z","caller":"rafthttp/stream.go:425","msg":"established TCP streaming connection with remote peer","stream-reader-type":"stream MsgApp v2","local-member-id":"37c54505d87a4118","remote-peer-id":"6fb90e7a505dc03f"}
{"level":"warn","ts":"2022-05-10T04:41:44.037Z","caller":"rafthttp/peer_status.go:68","msg":"peer became inactive (message send to peer failed)","peer-id":"6fb90e7a505dc03f","error":"failed to dial 6fb90e7a505dc03f on stream Message (dial tcp 10.255.148.128:2380: i/o timeout)"}
{"level":"info","ts":"2022-05-10T04:41:44.037Z","caller":"rafthttp/stream.go:250","msg":"set message encoder","from":"37c54505d87a4118","to":"37c54505d87a4118","stream-type":"stream MsgApp v2"}
{"level":"info","ts":"2022-05-10T04:41:44.037Z","caller":"rafthttp/peer_status.go:51","msg":"peer became active","peer-id":"6fb90e7a505dc03f"}
{"level":"warn","ts":"2022-05-10T04:41:44.037Z","caller":"rafthttp/stream.go:266","msg":"closed TCP streaming connection with remote peer","stream-writer-type":"stream MsgApp v2","local-member-id":"37c54505d87a4118","remote-peer-id":"6fb90e7a505dc03f"}
{"level":"warn","ts":"2022-05-10T04:41:44.037Z","caller":"rafthttp/stream.go:277","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream MsgApp v2","local-member-id":"37c54505d87a4118","remote-peer-id":"6fb90e7a505dc03f"}
{"level":"info","ts":"2022-05-10T04:41:44.051Z","caller":"rafthttp/stream.go:425","msg":"established TCP streaming connection with remote peer","stream-reader-type":"stream Message","local-member-id":"37c54505d87a4118","remote-peer-id":"6fb90e7a505dc03f"}
{"level":"info","ts":"2022-05-10T04:41:44.150Z","caller":"raft/raft.go:844","msg":"37c54505d87a4118 [logterm: 2793, index: 1704225, vote: 89cb9ab87293d525] ignored MsgVote from 6fb90e7a505dc03f [logterm: 2793, index: 1704132] at term 2793: lease is not expired (remaining ticks: 10)"}
{"level":"info","ts":"2022-05-10T04:41:45.150Z","caller":"raft/raft.go:844","msg":"37c54505d87a4118 [logterm: 2793, index: 1704228, vote: 89cb9ab87293d525] ignored MsgVote from 6fb90e7a505dc03f [logterm: 2793, index: 1704132] at term 2793: lease is not expired (remaining ticks: 10)"}
{"level":"info","ts":"2022-05-10T04:41:46.950Z","caller":"raft/raft.go:844","msg":"37c54505d87a4118 [logterm: 2793, index: 1704243, vote: 89cb9ab87293d525] ignored MsgVote from 6fb90e7a505dc03f [logterm: 2793, index: 1704132] at term 2793: lease is not expired (remaining ticks: 8)"}
{"level":"info","ts":"2022-05-10T04:41:47.810Z","caller":"raft/raft.go:923","msg":"37c54505d87a4118 is starting a new election at term 2793"}
{"level":"info","ts":"2022-05-10T04:41:47.810Z","caller":"raft/raft.go:713","msg":"37c54505d87a4118 became candidate at term 2794"}
{"level":"info","ts":"2022-05-10T04:41:47.810Z","caller":"raft/raft.go:824","msg":"37c54505d87a4118 received MsgVoteResp from 37c54505d87a4118 at term 2794"}
{"level":"info","ts":"2022-05-10T04:41:47.811Z","caller":"raft/raft.go:811","msg":"37c54505d87a4118 [logterm: 2793, index: 1704243] sent MsgVote request to 6fb90e7a505dc03f at term 2794"}
{"level":"info","ts":"2022-05-10T04:41:47.811Z","caller":"raft/raft.go:811","msg":"37c54505d87a4118 [logterm: 2793, index: 1704243] sent MsgVote request to 89cb9ab87293d525 at term 2794"}
{"level":"info","ts":"2022-05-10T04:41:47.811Z","caller":"raft/node.go:331","msg":"raft.node: 37c54505d87a4118 lost leader 89cb9ab87293d525 at term 2794"}
{"level":"info","ts":"2022-05-10T04:41:48.219Z","caller":"raft/raft.go:859","msg":"37c54505d87a4118 [term: 2794] received a MsgVote message with higher term from 89cb9ab87293d525 [term: 2817]"}
{"level":"info","ts":"2022-05-10T04:41:48.219Z","caller":"raft/raft.go:700","msg":"37c54505d87a4118 became follower at term 2817"}
{"level":"info","ts":"2022-05-10T04:41:48.219Z","caller":"raft/raft.go:960","msg":"37c54505d87a4118 [logterm: 2793, index: 1704243, vote: 0] cast MsgVote for 89cb9ab87293d525 [logterm: 2793, index: 1704243] at term 2817"}
{"level":"info","ts":"2022-05-10T04:41:48.221Z","caller":"raft/node.go:325","msg":"raft.node: 37c54505d87a4118 elected leader 89cb9ab87293d525 at term 2817"}
{"level":"warn","ts":"2022-05-10T04:41:48.221Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"1.463889434s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/daemonsets/kube-system/node-tuning-operator\" ","response":"","error":"etcdserver: leader changed"}
{"level":"info","ts":"2022-05-10T04:41:48.221Z","caller":"traceutil/trace.go:145","msg":"trace[1326835001] range","detail":"{range_begin:/registry/daemonsets/kube-system/node-tuning-operator; range_end:; }","duration":"1.463962936s","start":"2022-05-10T04:41:46.757Z","end":"2022-05-10T04:41:48.221Z","steps":["trace[1326835001] 'agreement among raft nodes before linearized reading'  (duration: 1.463782633s)"]}
{"level":"warn","ts":"2022-05-10T04:41:48.256Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"138.497741ms","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/volumeattachments\" range_end:\"/registry/volumeattachmentt\" count_only:true ","response":"range_response_count:0 size:6"}
{"level":"info","ts":"2022-05-10T04:41:48.256Z","caller":"traceutil/trace.go:145","msg":"trace[634035265] range","detail":"{range_begin:/registry/volumeattachments; range_end:/registry/volumeattachmentt; response_count:0; response_revision:1370529; }","duration":"138.575337ms","start":"2022-05-10T04:41:48.117Z","end":"2022-05-10T04:41:48.256Z","steps":["trace[634035265] 'agreement among raft nodes before linearized reading'  (duration: 138.417582ms)"]}
{"level":"warn","ts":"2022-05-10T04:41:48.256Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"342.115626ms","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/services/endpoints/kube-system/kube-scheduler\" ","response":"range_response_count:1 size:585"}
{"level":"info","ts":"2022-05-10T04:41:48.256Z","caller":"traceutil/trace.go:145","msg":"trace[1402876629] range","detail":"{range_begin:/registry/services/endpoints/kube-system/kube-scheduler; range_end:; response_count:1; response_revision:1370529; }","duration":"342.145766ms","start":"2022-05-10T04:41:47.914Z","end":"2022-05-10T04:41:48.256Z","steps":["trace[1402876629] 'agreement among raft nodes before linearized reading'  (duration: 342.061511ms)"]}
{"level":"warn","ts":"2022-05-10T04:41:48.257Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"1.107788044s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/services/endpoints/kube-system/kube-controller-manager\" ","response":"range_response_count:1 size:612"}
{"level":"info","ts":"2022-05-10T04:41:48.257Z","caller":"traceutil/trace.go:145","msg":"trace[1393588523] range","detail":"{range_begin:/registry/services/endpoints/kube-system/kube-controller-manager; range_end:; response_count:1; response_revision:1370529; }","duration":"1.107815388s","start":"2022-05-10T04:41:47.149Z","end":"2022-05-10T04:41:48.257Z","steps":["trace[1393588523] 'agreement among raft nodes before linearized reading'  (duration: 1.107735019s)"]}
{"level":"warn","ts":"2022-05-10T04:41:48.257Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"1.491095999s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/daemonsets/product-storage/engine-peon\" ","response":"range_response_count:1 size:3885"}
{"level":"info","ts":"2022-05-10T04:41:48.257Z","caller":"traceutil/trace.go:145","msg":"trace[361993807] range","detail":"{range_begin:/registry/daemonsets/product-storage/engine-peon; range_end:; response_count:1; response_revision:1370529; }","duration":"1.491126746s","start":"2022-05-10T04:41:46.766Z","end":"2022-05-10T04:41:48.257Z","steps":["trace[361993807] 'agreement among raft nodes before linearized reading'  (duration: 1.490967559s)"]}
{"level":"warn","ts":"2022-05-10T04:41:48.257Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"1.492303196s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/daemonsets/kube-system/node-local-dns\" ","response":"range_response_count:1 size:7021"}
{"level":"info","ts":"2022-05-10T04:41:48.257Z","caller":"traceutil/trace.go:145","msg":"trace[421415778] range","detail":"{range_begin:/registry/daemonsets/kube-system/node-local-dns; range_end:; response_count:1; response_revision:1370529; }","duration":"1.492358086s","start":"2022-05-10T04:41:46.765Z","end":"2022-05-10T04:41:48.257Z","steps":["trace[421415778] 'agreement among raft nodes before linearized reading'  (duration: 1.492196311s)"]}
{"level":"warn","ts":"2022-05-10T04:41:48.257Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"1.492451325s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/daemonsets/ingress-nginx/nginx-ingress-controller\" ","response":"range_response_count:1 size:4331"}
{"level":"info","ts":"2022-05-10T04:41:48.257Z","caller":"traceutil/trace.go:145","msg":"trace[1404998409] range","detail":"{range_begin:/registry/daemonsets/ingress-nginx/nginx-ingress-controller; range_end:; response_count:1; response_revision:1370529; }","duration":"1.492476692s","start":"2022-05-10T04:41:46.765Z","end":"2022-05-10T04:41:48.257Z","steps":["trace[1404998409] 'agreement among raft nodes before linearized reading'  (duration: 1.492405533s)"]}
{"level":"warn","ts":"2022-05-10T04:41:48.257Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"1.493302815s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/deployments/kube-system/adm-rds-deployment\" ","response":"range_response_count:1 size:3984"}
{"level":"info","ts":"2022-05-10T04:41:48.257Z","caller":"traceutil/trace.go:145","msg":"trace[1333018488] range","detail":"{range_begin:/registry/deployments/kube-system/adm-rds-deployment; range_end:; response_count:1; response_revision:1370529; }","duration":"1.493324415s","start":"2022-05-10T04:41:46.764Z","end":"2022-05-10T04:41:48.257Z","steps":["trace[1333018488] 'agreement among raft nodes before linearized reading'  (duration: 1.493245215s)"]}

lance5890 avatar May 10 '22 12:05 lance5890

@lance5890 your disk is very slow. this is not bug for your report. see this case: https://github.com/etcd-io/etcd/issues/10860

xiaods avatar May 17 '22 13:05 xiaods

@xiaods as u can see the metrics , my disk is ssd and etcd metric "etcd_disk_wal_fsync" shows ok,and this problem only occurs when an old member(its termId is greater then thel leader, which will lead to the leader transfer into follower) join into the new cluster

lance5890 avatar May 18 '22 08:05 lance5890

@lance5890 so your said: "we find the follower returns faster than the master reply; and the master can reach sometimes reach timeout" , where is your timeout log? I am not found some error for about timeout msg. the trace.go log is not error for etcd operation.

xiaods avatar May 19 '22 00:05 xiaods

I expect that this is the issue with first request after leader election timing out. This was fixed in v3.5. Please upgrade to confirm.

serathius avatar Jun 21 '22 11:06 serathius

i do not think so, as the etcd log shows ,when leader changes,the master just reply slower than the worker, and the leader change has already finished

lance5890 avatar Jun 21 '22 11:06 lance5890

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 21 '22 02:09 stale[bot]

Saw exactly same issue for 3.4.13 etcd

aft2022/10/25 14:07:36 INFO: 47d45c938c5efe92 is starting a new election at term 59
raft2022/10/25 14:07:36 INFO: 47d45c938c5efe92 became candidate at term 60
raft2022/10/25 14:07:36 INFO: 47d45c938c5efe92 received MsgVoteResp from 47d45c938c5efe92 at term 60
raft2022/10/25 14:07:36 INFO: 47d45c938c5efe92 [logterm: 59, index: 132326729] sent MsgVote request to 2b60c3fa94fbd293 at term 60
raft2022/10/25 14:07:36 INFO: 47d45c938c5efe92 [logterm: 59, index: 132326729] sent MsgVote request to 7681bd7095649d57 at term 60
raft2022/10/25 14:07:36 INFO: raft.node: 47d45c938c5efe92 lost leader 7681bd7095649d57 at term 60
raft2022/10/25 14:07:36 INFO: 47d45c938c5efe92 received MsgVoteResp from 2b60c3fa94fbd293 at term 60
raft2022/10/25 14:07:36 INFO: 47d45c938c5efe92 has received 2 MsgVoteResp votes and 0 vote rejections
raft2022/10/25 14:07:36 INFO: 47d45c938c5efe92 became leader at term 60
raft2022/10/25 14:07:36 INFO: raft.node: 47d45c938c5efe92 elected leader 47d45c938c5efe92 at term 60
Oct 25 14:07:42 node-1 50fecf9b079f[2630]: 2022-10-25 14:07:42.510749 W | etcdserver: read-only range request "key:\"/registry/health\" " with result "error:context deadline exceeded" took too long (2.000058603s) to execute

Oct 25 14:07:43 node-1 50fecf9b079f[2630]: 2022-10-25 14:07:43.486830 W | etcdserver: read-only range request "key:\"/registry/health\" " with result "error:context canceled" took too long (2.00005981s) to execute

Oct 25 14:07:43 node-1 50fecf9b079f[2630]: WARNING: 2022/10/25 14:07:43 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"

Oct 25 14:07:43 node-1 50fecf9b079f[2630]: 2022-10-25 14:07:43.636852 W | etcdserver: timed out waiting for read index response (local node might have slow network)

Oct 25 14:07:43 node-1 50fecf9b079f[2630]: 2022-10-25 14:07:43.637013 W | etcdserver: read-only range request "key:\"/registry/configmaps/istio-system/istio-leader\" " with result "error:etcdserver: request timed out" took too long (7.041669504s) to execute

geotransformer avatar Oct 26 '22 21:10 geotransformer

@geotransformer There is a known issue with leader election timeout on v3.4.13, please upgrade to latest etcd v3.4 release.

serathius avatar Oct 27 '22 15:10 serathius

serathius

@serathius There is a known issue with leader election timeout on v3.4.13, please upgrade to latest etcd v3.4 release.

Thanks a lot for your replies. Would you please point the reference of the etcd issue id?

For the election, i think it was due to some networking glitch. The follower lost the heartbeat

geotransformer avatar Oct 28 '22 00:10 geotransformer

@geotransformer There is a known issue with leader election timeout on v3.4.13, please upgrade to latest etcd v3.4 release. I have same problem Would you please point the reference of the etcd issue id? Thanks

jiayu997 avatar Jan 20 '23 03:01 jiayu997