tiflow icon indicating copy to clipboard operation
tiflow copied to clipboard

Exec command timeout on cdc node sometimes

Open fubinzh opened this issue 1 year ago • 3 comments

What did you do?

Run cdc redo log system test cases, for each case basically below steps included:

  1. create changefeed with redo log enabled
  2. run workload
  3. run redo apply
  4. remove changefeed

What did you expect to see?

execute command on cdc node should not timeout

What did you see instead?

Execute command on TiCDC seen timeout error seen sometimes:

example 1: execute echo commend timeout

2023-06-18T21:52:20.768Z        INFO    host/ticdc.go:283       Execute command on ticdc        {"command": "echo -e >/tmp/changefeed.toml \"[cons
istent] \n\"\"level = \\\"eventual\\\"\nstorage = \\\"s3://tmp/test-infra-redolog/redo-apply-cdc-all-node-restart-sync-bee28932-a8f8-4289-bd09-9d1
37926dcad?access-key=minioadmin&secret-access-key=minioadmin&endpoint=http://172.16.6.5:9000&force-path-style=true\\\"\nmax-log-size = 64 \n\"", "
timeout": "10s", "ticdc peer": 0}
2023-06-18T21:52:20.768Z        INFO    k8s/client.go:132       it should be noted that a long-running command will not be interrupted even the us
e case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129
[2023/06/18 21:52:30.770 +00:00] [PANIC] [main.go:114] ["run case failed"] [error="Exec command timeout"] [errorVerbose="Exec command timeout\ngit
hub.com/pingcap/test-infra/caselib/pkg/host.init\n\tgithub.com/pingcap/test-infra/caselib/pkg/host/host.go:16\nruntime.doInit\n\truntime/proc.go:6
506\nruntime.doInit\n\truntime/proc.go:6483\nruntime.doInit\n\truntime/proc.go:6483\nruntime.main\n\truntime/proc.go:233\nruntime.goexit\n\truntim
e/asm_amd64.s:1598"] [stack="main.main\n\t./main.go:114\nruntime.main\n\truntime/proc.go:250"]
panic: run case failed

example 2: execute changefeed remove command timeout

2023-06-18T22:42:45.719Z        INFO    host/ticdc.go:299       Execute command on ticdc        {"command": "/cdc version", "ticdc peer": 0}
2023-06-18T22:42:45.719Z        INFO    k8s/client.go:132       it should be noted that a long-running command will not be interrupted even the us
e case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129
[2023/06/18 22:43:04.528 +00:00] [INFO] [ticdc.go:199] ["get TiCDC version"] [cdcVersion=v7.2.0-alpha]
2023-06-18T22:43:04.528Z        INFO    host/ticdc.go:283       Execute command on ticdc        {"command": "/cdc  cli  changefeed  remove \"--ser
ver=127.0.0.1:8301\" \"--changefeed-id=redo-enable-cdc-all-node-restart-sync\"", "timeout": "10s", "ticdc peer": 0}
2023-06-18T22:43:04.529Z        INFO    k8s/client.go:132       it should be noted that a long-running command will not be interrupted even the us
e case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129
[2023/06/18 22:43:14.532 +00:00] [ERROR] [ticdc.go:72] ["run changefeed operation timeout"] [stack="github.com/pingcap/test-infra/caselib/pkg/host
.(*TiCDCHost).ChangeFeed\n\tgithub.com/pingcap/test-infra/caselib/pkg/host/ticdc.go:72\ngithub.com/pingcap/test-infra/caselib/pkg/steps.(*changeFe
edTask).Execute\n\tgithub.com/pingcap/test-infra/caselib/pkg/steps/cdc.go:80\ngithub.com/pingcap/test-infra/caselib/pkg/steps.(*Serial).Execute\n\
tgithub.com/pingcap/test-infra/caselib/pkg/steps/step.go:31\nmain.main\n\t./main.go:113\nruntime.main\n\truntime/proc.go:250"]
[2023/06/18 22:43:14.532 +00:00] [INFO] [cdc.go:81] ["run changeFeed task finished"] [result=null] [error="Exec command timeout"] [errorVerbose="E
xec command timeout\ngithub.com/pingcap/test-infra/caselib/pkg/host.init\n\tgithub.com/pingcap/test-infra/caselib/pkg/host/host.go:16\nruntime.doI
nit\n\truntime/proc.go:6506\nruntime.doInit\n\truntime/proc.go:6483\nruntime.doInit\n\truntime/proc.go:6483\nruntime.main\n\truntime/proc.go:233\n
runtime.goexit\n\truntime/asm_amd64.s:1598"]

Versions of the cluster

TiCDC version (execute cdc version):

["Welcome to Change Data Capture (CDC)"] [release-version=v7.2.0-alpha] [git-hash=5c368b52c0de831c0b9dd68ff66874d5a37f74ca] [git-branch=heads/refs/tags/v7.2.0-alpha] [utc-build-time="2023-06-16 11:35:10"] [go-version="go version go1.20.5 linux/amd64"] [failpoint-build=false]

fubinzh avatar Jun 19 '23 03:06 fubinzh

/found automation

fubinzh avatar Jun 19 '23 03:06 fubinzh

/severity moderate

fubinzh avatar Jun 19 '23 03:06 fubinzh

also sometime exec timeout seen when creating changefeed:

[2023/07/04 00:41:18.905 +00:00] [INFO] [ticdc.go:201] ["get TiCDC version"] [cdcVersion=v7.3.0-alpha]
2023-07-04T00:41:18.905Z	INFO	1815723	host/ticdc.go:301	Execute command on ticdc	{"command": "/cdc  cli  changefeed  create \"--server=127.0.0.1:8301\" \"--sink-uri=mysql://root:@downstream.cdc-testbed-tps-1815723-1-93:3306\" \"--changefeed-id=cdc-scale-out-in\" \"--no-confirm\"", "timeout": "10s", "ticdc peer": 0}
2023-07-04T00:41:18.905Z	INFO	k8s/client.go:132	it should be noted that a long-running command will not be interrupted even the use case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129
[2023/07/04 00:41:28.906 +00:00] [ERROR] [ticdc.go:72] ["run changefeed operation timeout"] [stack="github.com/pingcap/test-infra/caselib/pkg/host.(*TiCDCHost).ChangeFeed\n\tgithub.com/pingcap/test-infra/caselib/pkg/host/ticdc.go:72\ngithub.com/pingcap/test-infra/caselib/pkg/steps.(*changeFeedTask).Execute\n\tgithub.com/pingcap/test-infra/caselib/pkg/steps/cdc.go:80\ngithub.com/pingcap/test-infra/caselib/pkg/steps.withRecover\n\tgithub.com/pingcap/test-infra/caselib/pkg/steps/step.go:24\ngithub.com/pingcap/test-infra/caselib/pkg/steps.(*Serial).Execute\n\tgithub.com/pingcap/test-infra/caselib/pkg/steps/step.go:45\nmain.main\n\ttest-infra/caselib/ticdc/main.go:112\nruntime.main\n\truntime/proc.go:250"]
[2023/07/04 00:41:28.906 +00:00] [INFO] [cdc.go:81] ["run changeFeed task finished"] [result=null] [error="Exec command timeout"] [errorVerbose="Exec command timeout\ngithub.com/pingcap/test-infra/caselib/pkg/host.init\n\tgithub.com/pingcap/test-infra/caselib/pkg/host/host.go:16\nruntime.doInit\n\truntime/proc.go:6506\nruntime.doInit\n\truntime/proc.go:6483\nruntime.doInit\n\truntime/proc.go:6483\nruntime.main\n\truntime/proc.go:233\nruntime.goexit\n\truntime/asm_amd64.s:1598"]
[2023/07/04 00:41:28.906 +00:00] [PANIC] [main.go:113] ["run case failed"] [error="Exec command timeout"] [errorVerbose="Exec command timeout\ngithub.com/pingcap/test-infra/caselib/pkg/host.init\n\tgithub.com/pingcap/test-infra/caselib/pkg/host/host.go:16\nruntime.doInit\n\truntime/proc.go:6506\nruntime.doInit\n\truntime/proc.go:6483\nruntime.doInit\n\truntime/proc.go:6483\nruntime.main\n\truntime/proc.go:233\nruntime.goexit\n\truntime/asm_amd64.s:1598"] [case=cdc_scale_sync] [stack="main.main\n\ttest-infra/caselib/ticdc/main.go:113\nruntime.main\n\truntime/proc.go:250"]

fubinzh avatar Jul 06 '23 07:07 fubinzh