tiflow
tiflow copied to clipboard
ticdc restart and changefeed lag reached more than 10min when inject network partition betweent pdleader and pdfollowers
What did you do?
1、run tpcc with threads 10 and warehouse 1000 2、After 10 minutes, simulates pd leader is network isolated from all pd followers fault start time:2023-06-13 09:01:47 3、After 10 minutes, recovery the fault fault recover time:2023-06-13 09:11:48
What did you expect to see?
lag is less than 30s
What did you see instead?
ticdc lag reached more than 10min after inject fault
pd leader changed normally
Versions of the cluster
git hash : 1e2f277f2e3d9b57b15db9a2a9b2c62832c071ca
current status of DM cluster (execute query-status <task-name>
in dmctl)
No response
/remove-area dm /area ticdc
/severity major
/assign @asddongmen
@nongfushanquan: GitHub didn't allow me to assign the following users: asddongmen.
Note that only pingcap members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
In response to this:
/assign @asddongmen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
inject network partition betweent pdleader and pdfollowers
two ticdc crash
inject network partition between ticdc owner and all other pods,ticdc restart
chaos start ~ chaos end:2024/02/28 19:05:36 ~ 2024/02/28 19:08:36
ticdc logs: [2024/02/28 19:08:37.410 +08:00] [ERROR] [tso_dispatcher.go:562] ["[tso] update connection contexts failed"] [dc=global] [error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.200.49.116:2379: i/o timeout""] [2024/02/28 19:08:37.410 +08:00] [ERROR] [pd.go:228] ["updateTS error"] [txnScope=global] [error="context canceled"] errorVerbose="context canceled[ngithub.com/tikv/pd/client.(*tsoRequest).Wait\n\tgithub.com/tikv/pd/[email protected]/tso_dispatcher.go:118\ngithub.com/tikv/pd/client.(*client).GetTS\n\tgithub.com/tikv/pd/[email protected]/client.go:803\ngithub.com/tikv/client-go/v2/util.InterceptedPDClient.GetTS\n\tgithub.com/tikv/client-go/[email protected]/util/pd_interceptor.go:81\ngithub.com/tikv/client-go/v2/oracle/oracles.(*pdOracle).getTimestamp\n\tgithub.com/tikv/client-go/[email protected]/oracle/oracles/pd.go:147\ngithub.com/tikv/client-go/v2/oracle/oracles.(*pdOracle).updateTS.func1\n\tgithub.com/tikv/client-go/[email protected]/oracle/oracles/pd.go:226\nsync.(*Map).Range\n\tsync/map.go:476\ngithub.com/tikv/client-go/v2/oracle/oracles.(*pdOracle).updateTS\n\tgithub.com/tikv/client-go/[email protected]/oracle/oracles/pd.go:224\nruntime.goexit\n\truntime/asm_amd64.s:1650\ngithub.com/tikv/client-go/v2/oracle/oracles.(*pdOracle).getTimestamp\n\tgithub.com/tikv/client-go/[email protected]/oracle/oracles/pd.go:152\ngithub.com/tikv/client-go/v2/oracle/oracles.(*pdOracle).updateTS.func1\n\tgithub.com/tikv/client-go/[email protected]/oracle/oracles/pd.go:226\nsync.(*Map).Range\n\tsync/map.go:476\ngithub.com/tikv/client-go/v2/oracle/oracles.(*pdOracle).updateTS\n\tgithub.com/tikv/client-go/[email protected]/oracle/oracles/pd.go:224\nruntime.goexit\n\truntime/asm_amd64.s:1650"] [2024/02/28 19:08:37.410 +08:00] [INFO] [tso_dispatcher.go:344] ["[tso] exit tso dispatcher"] [dc-location=global] [2024/02/28 19:08:37.410 +08:00] [INFO] [tso_client.go:139] ["close tso client"] [2024/02/28 19:08:37.410 +08:00] [INFO] [tso_client.go:150] ["tso client is closed"] [2024/02/28 19:08:37.410 +08:00] [INFO] [pd_service_discovery.go:664] ["[pd] close pd service discovery client"] [2024/02/28 19:08:37.410 +08:00] [INFO] [client.go:319] ["[pd] http client closed"] [source=tikv-driver] [2024/02/28 19:08:37.413 +08:00] [WARN] [upstream.go:299] ["etcd session close failed"] [error="etcdserver: requested lease not found"] [2024/02/28 19:08:37.413 +08:00] [INFO] [upstream.go:305] ["upstream closed"] [upstreamID=7340490029962833542] [2024/02/28 19:08:38.370 +08:00] [ERROR] [pd_service_discovery.go:613] ["[pd] failed to update service mode"] [urls="[http://tc-pd-0.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379,http://tc-pd-1.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379,http://tc-pd-2.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379]"] [error="[PD:client:ErrClientGetClusterInfo]error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:tc-pd-2.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379 status:READY: error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:tc-pd-2.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379 status:READY"] [2024/02/28 19:08:38.379 +08:00] [WARN] [server.go:315] ["etcd health check: cannot collect all members"] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"] errorVerbose="rpc error: code = DeadlineExceeded desc = context deadline exceeded[ngithub.com/tikv/pd/client.(*client).respForErr\n\tgithub.com/tikv/pd/[email protected]/client.go:1550\ngithub.com/tikv/pd/client.(*client).GetAllMembers\n\tgithub.com/tikv/pd/[email protected]/client.go:735\ngithub.com/pingcap/tiflow/pkg/pdutil.(*pdAPIClient).CollectMemberEndpoints\n\tgithub.com/pingcap/tiflow/pkg/pdutil/api_client.go:346\ngithub.com/pingcap/tiflow/cdc/server.(*server).upstreamPDHealthChecker\n\tgithub.com/pingcap/tiflow/cdc/server/server.go:313\ngithub.com/pingcap/tiflow/cdc/server.(*server).run.func1\n\tgithub.com/pingcap/tiflow/cdc/server/server.go:347\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/[email protected]/errgroup/errgroup.go:78\nruntime.goexit\n\truntime/asm_amd64.s:1650"] [2024/02/28 19:08:47.414 +08:00] [WARN] [check.go:88] ["check TiKV version failed"] [error="[CDC:ErrGetAllStoresFailed]get stores from pd failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded"] errorVerbose="[CDC:ErrGetAllStoresFailed]get stores from pd failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded[ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/[email protected]/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\n\tgithub.com/pingcap/[email protected]/normalize.go:164\ngithub.com/pingcap/tiflow/pkg/errors.WrapError\n\tgithub.com/pingcap/tiflow/pkg/errors/helper.go:34\ngithub.com/pingcap/tiflow/pkg/version.CheckStoreVersion\n\tgithub.com/pingcap/tiflow/pkg/version/check.go:209\ngithub.com/pingcap/tiflow/pkg/version.CheckClusterVersion\n\tgithub.com/pingcap/tiflow/pkg/version/check.go:83\ngithub.com/pingcap/tiflow/pkg/upstream.initUpstream\n\tgithub.com/pingcap/tiflow/pkg/upstream/upstream.go:179\ngithub.com/pingcap/tiflow/pkg/upstream.(*Manager).AddDefaultUpstream\n\tgithub.com/pingcap/tiflow/pkg/upstream/manager.go:116\ngithub.com/pingcap/tiflow/cdc/capture.(*captureImpl).reset\n\tgithub.com/pingcap/tiflow/cdc/capture/capture.go:250\ngithub.com/pingcap/tiflow/cdc/capture.(*captureImpl).run\n\tgithub.com/pingcap/tiflow/cdc/capture/capture.go:333\ngithub.com/pingcap/tiflow/cdc/capture.(*captureImpl).Run\n\tgithub.com/pingcap/tiflow/cdc/capture/capture.go:308\ngithub.com/pingcap/tiflow/cdc/server.(*server).run.func6\n\tgithub.com/pingcap/tiflow/cdc/server/server.go:372\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/[email protected]/errgroup/errgroup.go:78\nruntime.goexit\n\truntime/asm_amd64.s:1650"] [2024/02/28 19:08:47.427 +08:00] [INFO] [pd_service_discovery.go:1016] ["[pd] update member urls"] [old-urls="[http://tc-pd-0.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379,http://tc-pd-1.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379,http://tc-pd-2.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379,http://tc-pd:2379]"] [new-urls="[http://tc-pd-0.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379,http://tc-pd-1.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379,http://tc-pd-2.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379]"] [2024/02/28 19:08:47.427 +08:00] [INFO] [pd_service_discovery.go:1043] ["[pd] switch leader"] [new-leader=http://tc-pd-2.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379/] [old-leader=] [2024/02/28 19:08:47.427 +08:00] [INFO] [pd_service_discovery.go:525] ["[pd] init cluster id"] [cluster-id=7340490029962833542] [2024/02/28 19:08:47.427 +08:00] [INFO] [client.go:606] ["[pd] changing service mode"] [old-mode=UNKNOWN_SVC_MODE] [new-mode=PD_SVC_MODE] [2024/02/28 19:08:47.427 +08:00] [INFO] [tso_client.go:231] ["[tso] switch dc tso global allocator serving address"] [dc-location=global] [new-address=http://tc-pd-2.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379/] [2024/02/28 19:08:47.428 +08:00] [INFO] [tso_dispatcher.go:323] ["[tso] tso dispatcher created"] [dc-location=global] [2024/02/28 19:08:47.428 +08:00] [INFO] [client.go:654] ["[pd] service mode changed"] [old-mode=UNKNOWN_SVC_MODE] [new-mode=PD_SVC_MODE] [2024/02/28 19:08:47.429 +08:00] [INFO] [tikv_driver.go:200] ["using API V1."] [2024/02/28 19:08:47.429 +08:00] [INFO] [tso_dispatcher.go:441] ["[tso] tso stream is not ready"] [dc=global] [2024/02/28 19:08:48.371 +08:00] [ERROR] [pd_service_discovery.go:613] ["[pd] failed to update service mode"] [urls="[http://tc-pd-0.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379,http://tc-pd-1.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379,http://tc-pd-2.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379]"] [error="[PD:client:ErrClientGetClusterInfo]error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:tc-pd-2.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379 status:READY: error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:tc-pd-2.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379 status:READY"] [2024/02/28 19:08:48.380 +08:00] [WARN] [server.go:315] ["etcd health check: cannot collect all members"] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"] errorVerbose="rpc error: code = DeadlineExceeded desc = context deadline exceeded[ngithub.com/tikv/pd/client.(*client).respForErr\n\tgithub.com/tikv/pd/[email protected]/client.go:1550\ngithub.com/tikv/pd/client.(*client).GetAllMembers\n\tgithub.com/tikv/pd/[email protected]/client.go:735\ngithub.com/pingcap/tiflow/pkg/pdutil.(*pdAPIClient).CollectMemberEndpoints\n\tgithub.com/pingcap/tiflow/pkg/pdutil/api_client.go:346\ngithub.com/pingcap/tiflow/cdc/server.(*server).upstreamPDHealthChecker\n\tgithub.com/pingcap/tiflow/cdc/server/server.go:313\ngithub.com/pingcap/tiflow/cdc/server.(*server).run.func1\n\tgithub.com/pingcap/tiflow/cdc/server/server.go:347\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/[email protected]/errgroup/errgroup.go:78\nruntime.goexit\n\truntime/asm_amd64.s:1650"] [2024/02/28 19:08:57.429 +08:00] [ERROR] [tso_dispatcher.go:202] ["[tso] tso request is canceled due to timeout"] [dc-location=global] [error="[PD:client:ErrClientGetTSOTimeout]get TSO timeout"] [2024/02/28 19:08:57.429 +08:00] [ERROR] [tso_dispatcher.go:498] ["[tso] getTS error after processing requests"] [dc-location=global] [stream-addr=http://tc-pd-2.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379/] [error="[PD:client:ErrClientGetTSO]get TSO failed, %v: rpc error: code = Canceled desc = context canceled"] [2024/02/28 19:08:57.429 +08:00] [ERROR] [capture.go:335] ["reset capture failed"] [error="rpc error: code = Canceled desc = context canceled"] errorVerbose="rpc error: code = Canceled desc = context canceled[ngithub.com/tikv/pd/client.(*pdTSOStream).processRequests\n\tgithub.com/tikv/pd/[email protected]/tso_stream.go:149\ngithub.com/tikv/pd/client.(*tsoClient).processRequests\n\tgithub.com/tikv/pd/[email protected]/tso_dispatcher.go:763\ngithub.com/tikv/pd/client.(*tsoClient).handleDispatcher\n\tgithub.com/tikv/pd/[email protected]/tso_dispatcher.go:488\nruntime.goexit\n\truntime/asm_amd64.s:1650\ngithub.com/tikv/pd/client.(*tsoRequest).Wait\n\tgithub.com/tikv/pd/[email protected]/tso_dispatcher.go:104\ngithub.com/tikv/pd/client.(*client).GetTS\n\tgithub.com/tikv/pd/[email protected]/client.go:803\ngithub.com/pingcap/tiflow/pkg/pdutil.NewClock\n\tgithub.com/pingcap/tiflow/pkg/pdutil/clock.go:62\ngithub.com/pingcap/tiflow/pkg/upstream.initUpstream\n\tgithub.com/pingcap/tiflow/pkg/upstream/upstream.go:197\ngithub.com/pingcap/tiflow/pkg/upstream.(*Manager).AddDefaultUpstream\n\tgithub.com/pingcap/tiflow/pkg/upstream/manager.go:116\ngithub.com/pingcap/tiflow/cdc/capture.(*captureImpl).reset\n\tgithub.com/pingcap/tiflow/cdc/capture/capture.go:250\ngithub.com/pingcap/tiflow/cdc/capture.(*captureImpl).run\n\tgithub.com/pingcap/tiflow/cdc/capture/capture.go:333\ngithub.com/pingcap/tiflow/cdc/capture.(*captureImpl).Run\n\tgithub.com/pingcap/tiflow/cdc/capture/capture.go:308\ngithub.com/pingcap/tiflow/cdc/server.(*server).run.func6\n\tgithub.com/pingcap/tiflow/cdc/server/server.go:372\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/[email protected]/errgroup/errgroup.go:78\nruntime.goexit\n\truntime/asm_amd64.s:1650"] [2024/02/28 19:08:57.430 +08:00] [INFO] [capture.go:328] ["the capture routine has exited"] [2024/02/28 19:08:57.430 +08:00] [WARN] [server.go:315] ["etcd health check: cannot collect all members"] [error="rpc error: code = Canceled desc = context canceled"] errorVerbose="rpc error: code = Canceled desc = context canceled[ngithub.com/tikv/pd/client.(*client).respForErr\n\tgithub.com/tikv/pd/[email protected]/client.go:1550\ngithub.com/tikv/pd/client.(*client).GetAllMembers\n\tgithub.com/tikv/pd/[email protected]/client.go:735\ngithub.com/pingcap/tiflow/pkg/pdutil.(*pdAPIClient).CollectMemberEndpoints\n\tgithub.com/pingcap/tiflow/pkg/pdutil/api_client.go:346\ngithub.com/pingcap/tiflow/cdc/server.(*server).upstreamPDHealthChecker\n\tgithub.com/pingcap/tiflow/cdc/server/server.go:313\ngithub.com/pingcap/tiflow/cdc/server.(*server).run.func1\n\tgithub.com/pingcap/tiflow/cdc/server/server.go:347\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/[email protected]/errgroup/errgroup.go:78\nruntime.goexit\n\truntime/asm_amd64.s:1650"] [2024/02/28 19:08:57.430 +08:00] [ERROR] [server.go:298] ["http server error"] [error="[CDC:ErrServeHTTP]serve http error: mux: server closed"] errorVerbose="[CDC:ErrServeHTTP]serve http error: mux: server closed[ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/[email protected]/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\n\tgithub.com/pingcap/[email protected]/normalize.go:164\ngithub.com/pingcap/tiflow/pkg/errors.WrapError\n\tgithub.com/pingcap/tiflow/pkg/errors/helper.go:34\ngithub.com/pingcap/tiflow/cdc/server.(*server).startStatusHTTP.func1\n\tgithub.com/pingcap/tiflow/cdc/server/server.go:298\nruntime.goexit\n\truntime/asm_amd64.s:1650"] [2024/02/28 19:08:57.430 +08:00] [WARN] [server.go:139] ["cdc server exits with error"] [error="rpc error: code = Canceled desc = context canceled"] errorVerbose="rpc error: code = Canceled desc = context canceled[ngithub.com/tikv/pd/client.(*pdTSOStream).processRequests\n\tgithub.com/tikv/pd/[email protected]/tso_stream.go:149\ngithub.com/tikv/pd/client.(*tsoClient).processRequests\n\tgithub.com/tikv/pd/[email protected]/tso_dispatcher.go:763\ngithub.com/tikv/pd/client.(*tsoClient).handleDispatcher\n\tgithub.com/tikv/pd/[email protected]/tso_dispatcher.go:488\nruntime.goexit\n\truntime/asm_amd64.s:1650\ngithub.com/tikv/pd/client.(*tsoRequest).Wait\n\tgithub.com/tikv/pd/[email protected]/tso_dispatcher.go:104\ngithub.com/tikv/pd/client.(*client).GetTS\n\tgithub.com/tikv/pd/[email protected]/client.go:803\ngithub.com/pingcap/tiflow/pkg/pdutil.NewClock\n\tgithub.com/pingcap/tiflow/pkg/pdutil/clock.go:62\ngithub.com/pingcap/tiflow/pkg/upstream.initUpstream\n\tgithub.com/pingcap/tiflow/pkg/upstream/upstream.go:197\ngithub.com/pingcap/tiflow/pkg/upstream.(*Manager).AddDefaultUpstream\n\tgithub.com/pingcap/tiflow/pkg/upstream/manager.go:116\ngithub.com/pingcap/tiflow/cdc/capture.(*captureImpl).reset\n\tgithub.com/pingcap/tiflow/cdc/capture/capture.go:250\ngithub.com/pingcap/tiflow/cdc/capture.(*captureImpl).run\n\tgithub.com/pingcap/tiflow/cdc/capture/capture.go:333\ngithub.com/pingcap/tiflow/cdc/capture.(*captureImpl).Run\n\tgithub.com/pingcap/tiflow/cdc/capture/capture.go:308\ngithub.com/pingcap/tiflow/cdc/server.(*server).run.func6\n\tgithub.com/pingcap/tiflow/cdc/server/server.go:372\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/[email protected]/errgroup/errgroup.go:78\nruntime.goexit\n\truntime/asm_amd64.s:1650"] [2024/02/28 19:08:57.430 +08:00] [INFO] [capture.go:707] ["message router closed"] [captureID=a277c9b2-c0b6-4ef0-aa9d-3d51b50cd83f] [2024/02/28 19:08:57.432 +08:00] [INFO] [server.go:424] ["sort engine manager closed"] [duration=2.032547ms] [2024/02/28 19:08:57.432 +08:00] [INFO] [pd_service_discovery.go:577] ["[pd] exit member loop due to context canceled"] [2024/02/28 19:08:57.432 +08:00] [INFO] [resource_manager_client.go:295] ["[resource manager] exit resource token dispatcher"] [2024/02/28 19:08:57.432 +08:00] [INFO] [tso_dispatcher.go:240] ["exit tso dispatcher loop"] [2024/02/28 19:08:57.432 +08:00] [INFO] [tso_dispatcher.go:410] ["[tso] stop fetching the pending tso requests due to context canceled"] [dc-location=global] [2024/02/28 19:08:57.432 +08:00] [INFO] [tso_dispatcher.go:344] ["[tso] exit tso dispatcher"] [dc-location=global] [2024/02/28 19:08:57.432 +08:00] [INFO] [tso_dispatcher.go:186] ["exit tso requests cancel loop"] [2024/02/28 19:08:57.432 +08:00] [ERROR] [pd_service_discovery.go:613] ["[pd] failed to update service mode"] [urls="[http://tc-pd-0.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379,http://tc-pd-1.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379,http://tc-pd-2.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379]"] [error="[PD:client:ErrClientGetClusterInfo]error:rpc error: code = Canceled desc = context canceled target:tc-pd-2.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379 status:READY: error:rpc error: code = Canceled desc = context canceled target:tc-pd-2.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379 status:READY"] [2024/02/28 19:08:57.432 +08:00] [ERROR] [pd_service_discovery.go:613] ["[pd] failed to update service mode"] [urls="[http://tc-pd-0.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379,http://tc-pd-1.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379,http://tc-pd-2.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379]"] [error="[PD:client:ErrClientGetClusterInfo]error:rpc error: code = Canceled desc = context canceled target:tc-pd-2.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379 status:READY: error:rpc error: code = Canceled desc = context canceled target:tc-pd-2.tc-pd-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:2379 status:READY"] [2024/02/28 19:08:57.432 +08:00] [INFO] [tso_client.go:134] ["closing tso client"] [2024/02/28 19:08:57.432 +08:00] [INFO] [tso_client.go:139] ["close tso client"] [2024/02/28 19:08:57.432 +08:00] [INFO] [tso_client.go:150] ["tso client is closed"] [2024/02/28 19:08:57.432 +08:00] [INFO] [pd_service_discovery.go:664] ["[pd] close pd service discovery client"] [2024/02/28 19:08:59.752 +08:00] [INFO] [helper.go:54] ["init log"] [file=/var/lib/ticdc/log/ticdc.log] [level=info] [2024/02/28 19:08:59.752 +08:00] [INFO] [tz.go:34] ["Use the timezone of the TiCDC server machine"] [timezoneName=System] [timezone=Asia/Shanghai] [2024/02/28 19:08:59.752 +08:00] [INFO] [version.go:47] ["Welcome to Change Data Capture (CDC)"] [release-version=v8.0.0-alpha] [git-hash=25ce29c2a1802bbb4cd26008f322728959a91f7a] [git-branch=heads/refs/tags/v8.0.0-alpha] [utc-build-time="2024-02-27 11:37:29"] [go-version="go version go1.21.6 linux/amd64"] [failpoint-build=false] [2024/02/28 19:08:59.752 +08:00] [INFO] [server.go:125] ["CDC server created"] [pd="[http://tc-pd:2379/]"] [config="{"addr":"0.0.0.0:8301","advertise-addr":"tc-ticdc-1.tc-ticdc-peer.endless-ha-test-ticdc-tps-7080582-1-976.svc:8301","log-file":"/var/lib/ticdc/log/ticdc.log","log-level":"info","log":{"file":{"max-size":301,"max-days":0,"max-backups":0},"error-output":"stderr"},"data-dir":"","gc-ttl":86400,"tz":"System","capture-session-ttl":10,"owner-flush-interval":50000000,"processor-flush-interval":50000000,"sorter":{"sort-dir":"/tmp/sorter","cache-size-in-mb":128},"security":{"ca-path":"","cert-path":"","key-path":"","cert-allowed-cn":null,"mtls":false,"client-user-required":false,"client-allowed-user":null},"kv-client":{"enable-multiplexing":true,\
@asddongmen will see whether it can be addressed by https://github.com/etcd-io/etcd/pull/17465#event-11888619658. If not, then I suggest we address it in long term.
After the merge of https://github.com/pingcap/tiflow/pull/10881, the checkpointTs lag during pd-leader-io-hang cases was reduced to less than 120s, meeting the requirement.
@Lily2025: Reopened this issue.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/remove-type bug /type enhancement
closed