tiflow
tiflow copied to clipboard
Exec command timeout on cdc node sometimes
What did you do?
Run cdc redo log system test cases, for each case basically below steps included:
- create changefeed with redo log enabled
- run workload
- run redo apply
- remove changefeed
What did you expect to see?
execute command on cdc node should not timeout
What did you see instead?
Execute command on TiCDC seen timeout error seen sometimes:
example 1: execute echo commend timeout
2023-06-18T21:52:20.768Z INFO host/ticdc.go:283 Execute command on ticdc {"command": "echo -e >/tmp/changefeed.toml \"[cons
istent] \n\"\"level = \\\"eventual\\\"\nstorage = \\\"s3://tmp/test-infra-redolog/redo-apply-cdc-all-node-restart-sync-bee28932-a8f8-4289-bd09-9d1
37926dcad?access-key=minioadmin&secret-access-key=minioadmin&endpoint=http://172.16.6.5:9000&force-path-style=true\\\"\nmax-log-size = 64 \n\"", "
timeout": "10s", "ticdc peer": 0}
2023-06-18T21:52:20.768Z INFO k8s/client.go:132 it should be noted that a long-running command will not be interrupted even the us
e case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129
[2023/06/18 21:52:30.770 +00:00] [PANIC] [main.go:114] ["run case failed"] [error="Exec command timeout"] [errorVerbose="Exec command timeout\ngit
hub.com/pingcap/test-infra/caselib/pkg/host.init\n\tgithub.com/pingcap/test-infra/caselib/pkg/host/host.go:16\nruntime.doInit\n\truntime/proc.go:6
506\nruntime.doInit\n\truntime/proc.go:6483\nruntime.doInit\n\truntime/proc.go:6483\nruntime.main\n\truntime/proc.go:233\nruntime.goexit\n\truntim
e/asm_amd64.s:1598"] [stack="main.main\n\t./main.go:114\nruntime.main\n\truntime/proc.go:250"]
panic: run case failed
example 2: execute changefeed remove command timeout
2023-06-18T22:42:45.719Z INFO host/ticdc.go:299 Execute command on ticdc {"command": "/cdc version", "ticdc peer": 0}
2023-06-18T22:42:45.719Z INFO k8s/client.go:132 it should be noted that a long-running command will not be interrupted even the us
e case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129
[2023/06/18 22:43:04.528 +00:00] [INFO] [ticdc.go:199] ["get TiCDC version"] [cdcVersion=v7.2.0-alpha]
2023-06-18T22:43:04.528Z INFO host/ticdc.go:283 Execute command on ticdc {"command": "/cdc cli changefeed remove \"--ser
ver=127.0.0.1:8301\" \"--changefeed-id=redo-enable-cdc-all-node-restart-sync\"", "timeout": "10s", "ticdc peer": 0}
2023-06-18T22:43:04.529Z INFO k8s/client.go:132 it should be noted that a long-running command will not be interrupted even the us
e case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129
[2023/06/18 22:43:14.532 +00:00] [ERROR] [ticdc.go:72] ["run changefeed operation timeout"] [stack="github.com/pingcap/test-infra/caselib/pkg/host
.(*TiCDCHost).ChangeFeed\n\tgithub.com/pingcap/test-infra/caselib/pkg/host/ticdc.go:72\ngithub.com/pingcap/test-infra/caselib/pkg/steps.(*changeFe
edTask).Execute\n\tgithub.com/pingcap/test-infra/caselib/pkg/steps/cdc.go:80\ngithub.com/pingcap/test-infra/caselib/pkg/steps.(*Serial).Execute\n\
tgithub.com/pingcap/test-infra/caselib/pkg/steps/step.go:31\nmain.main\n\t./main.go:113\nruntime.main\n\truntime/proc.go:250"]
[2023/06/18 22:43:14.532 +00:00] [INFO] [cdc.go:81] ["run changeFeed task finished"] [result=null] [error="Exec command timeout"] [errorVerbose="E
xec command timeout\ngithub.com/pingcap/test-infra/caselib/pkg/host.init\n\tgithub.com/pingcap/test-infra/caselib/pkg/host/host.go:16\nruntime.doI
nit\n\truntime/proc.go:6506\nruntime.doInit\n\truntime/proc.go:6483\nruntime.doInit\n\truntime/proc.go:6483\nruntime.main\n\truntime/proc.go:233\n
runtime.goexit\n\truntime/asm_amd64.s:1598"]
Versions of the cluster
TiCDC version (execute cdc version
):
["Welcome to Change Data Capture (CDC)"] [release-version=v7.2.0-alpha] [git-hash=5c368b52c0de831c0b9dd68ff66874d5a37f74ca] [git-branch=heads/refs/tags/v7.2.0-alpha] [utc-build-time="2023-06-16 11:35:10"] [go-version="go version go1.20.5 linux/amd64"] [failpoint-build=false]
/found automation
/severity moderate
also sometime exec timeout seen when creating changefeed:
[2023/07/04 00:41:18.905 +00:00] [INFO] [ticdc.go:201] ["get TiCDC version"] [cdcVersion=v7.3.0-alpha]
2023-07-04T00:41:18.905Z INFO 1815723 host/ticdc.go:301 Execute command on ticdc {"command": "/cdc cli changefeed create \"--server=127.0.0.1:8301\" \"--sink-uri=mysql://root:@downstream.cdc-testbed-tps-1815723-1-93:3306\" \"--changefeed-id=cdc-scale-out-in\" \"--no-confirm\"", "timeout": "10s", "ticdc peer": 0}
2023-07-04T00:41:18.905Z INFO k8s/client.go:132 it should be noted that a long-running command will not be interrupted even the use case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129
[2023/07/04 00:41:28.906 +00:00] [ERROR] [ticdc.go:72] ["run changefeed operation timeout"] [stack="github.com/pingcap/test-infra/caselib/pkg/host.(*TiCDCHost).ChangeFeed\n\tgithub.com/pingcap/test-infra/caselib/pkg/host/ticdc.go:72\ngithub.com/pingcap/test-infra/caselib/pkg/steps.(*changeFeedTask).Execute\n\tgithub.com/pingcap/test-infra/caselib/pkg/steps/cdc.go:80\ngithub.com/pingcap/test-infra/caselib/pkg/steps.withRecover\n\tgithub.com/pingcap/test-infra/caselib/pkg/steps/step.go:24\ngithub.com/pingcap/test-infra/caselib/pkg/steps.(*Serial).Execute\n\tgithub.com/pingcap/test-infra/caselib/pkg/steps/step.go:45\nmain.main\n\ttest-infra/caselib/ticdc/main.go:112\nruntime.main\n\truntime/proc.go:250"]
[2023/07/04 00:41:28.906 +00:00] [INFO] [cdc.go:81] ["run changeFeed task finished"] [result=null] [error="Exec command timeout"] [errorVerbose="Exec command timeout\ngithub.com/pingcap/test-infra/caselib/pkg/host.init\n\tgithub.com/pingcap/test-infra/caselib/pkg/host/host.go:16\nruntime.doInit\n\truntime/proc.go:6506\nruntime.doInit\n\truntime/proc.go:6483\nruntime.doInit\n\truntime/proc.go:6483\nruntime.main\n\truntime/proc.go:233\nruntime.goexit\n\truntime/asm_amd64.s:1598"]
[2023/07/04 00:41:28.906 +00:00] [PANIC] [main.go:113] ["run case failed"] [error="Exec command timeout"] [errorVerbose="Exec command timeout\ngithub.com/pingcap/test-infra/caselib/pkg/host.init\n\tgithub.com/pingcap/test-infra/caselib/pkg/host/host.go:16\nruntime.doInit\n\truntime/proc.go:6506\nruntime.doInit\n\truntime/proc.go:6483\nruntime.doInit\n\truntime/proc.go:6483\nruntime.main\n\truntime/proc.go:233\nruntime.goexit\n\truntime/asm_amd64.s:1598"] [case=cdc_scale_sync] [stack="main.main\n\ttest-infra/caselib/ticdc/main.go:113\nruntime.main\n\truntime/proc.go:250"]