tiflow
tiflow copied to clipboard
lots of logs and cdc panic during tikv rolling restart
What did you do?
- pd.enable-forwarding = true is configured
- trigger tikv rolling restart by edit tikv spec (before that, tikv-2 was pending for 10+ hours due to k8s scheduler issue)
What did you expect to see?
- cdc should not panic
What did you see instead?
lots of cdc logs seen in short time about 300MB logs every minutes, until tikv rolling restart finishes. And cdc panic seen
[root@upstream-ticdc-0 log]# du -shl *
301M ticdc-2023-07-06T07-11-53.148.log
300M ticdc-2023-07-06T07-12-06.878.log
300M ticdc-2023-07-06T07-12-19.253.log
300M ticdc-2023-07-07T01-25-27.927.log
300M ticdc-2023-07-07T01-26-24.099.log
300M ticdc-2023-07-07T01-27-05.064.log
300M ticdc-2023-07-07T01-27-54.467.log
300M ticdc-2023-07-07T01-28-44.971.log
300M ticdc-2023-07-07T01-29-39.447.log
300M ticdc-2023-07-07T01-30-34.204.log
300M ticdc-2023-07-07T01-31-22.034.log
300M ticdc-2023-07-07T01-31-33.798.log
301M ticdc-2023-07-07T01-32-49.720.log
13M ticdc.log
[root@bogon ticdc]# kubectl --kubeconfig kubeconfig.yml -n cdc-testbed-airbnb-tps-1814881-1-541 logs -p upstream-ticdc-0
[WARN] TiCDC server data-dir is not set. Please use `cdc server --data-dir` to start the cdc server if possible.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x310ae9e]
goroutine 170897871 [running]:
github.com/pingcap/tiflow/cdc/kv.(*eventFeedSession).enqueueError(0xc005ea6160, {0x0?, 0x0?}, {{{0x6cac, 0x5, 0x1850}, {0x0, {0xc094b39f60, 0x1b, 0x20}, ...}, ...}, ...})
github.com/pingcap/tiflow/cdc/kv/client.go:875 +0x7e
github.com/pingcap/tiflow/cdc/kv.(*eventFeedSession).onRegionFail(0xc005ea6160, {0x0, 0x0}, {{{0x6cac, 0x5, 0x1850}, {0x0, {0xc094b39f60, 0x1b, 0x20}, ...}, ...}, ...})
github.com/pingcap/tiflow/cdc/kv/client.go:558 +0x177
github.com/pingcap/tiflow/cdc/kv.(*regionWorker).evictAllRegions(0xc0965e4360)
github.com/pingcap/tiflow/cdc/kv/region_worker.go:825 +0x25b
github.com/pingcap/tiflow/cdc/kv.(*eventFeedSession).receiveFromStream(0xc005ea6160, {0x49d4aa8?, 0xc01d10ecd0?}, 0xc0a67f2410?, {0xc002d40ba0, 0x51}, 0xd, {0x49ecc18, 0xc095d21bc0}, 0xc0d38b6700)
github.com/pingcap/tiflow/cdc/kv/client.go:1072 +0x13fd
github.com/pingcap/tiflow/cdc/kv.(*eventFeedSession).requestRegionToStore.func2()
github.com/pingcap/tiflow/cdc/kv/client.go:660 +0xb2
golang.org/x/sync/errgroup.(*Group).Go.func1()
golang.org/x/[email protected]/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
golang.org/x/[email protected]/errgroup/errgroup.go:72 +0xa5
[root@bogon ticdc]# kubectl --kubeconfig kubeconfig.yml -n cdc-testbed-airbnb-tps-1814881-1-541 logs -p upstream-ticdc-1
[WARN] TiCDC server data-dir is not set. Please use `cdc server --data-dir` to start the cdc server if possible.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x310ae9e]
goroutine 13788899 [running]:
github.com/pingcap/tiflow/cdc/kv.(*eventFeedSession).enqueueError(0xc03e6ba6e0, {0x0?, 0x0?}, {{{0xa4ec, 0x5, 0x17a2}, {0x0, {0xc03e586ca8, 0x12, 0x18}, ...}, ...}, ...})
github.com/pingcap/tiflow/cdc/kv/client.go:875 +0x7e
github.com/pingcap/tiflow/cdc/kv.(*eventFeedSession).onRegionFail(0xc03e6ba6e0, {0x0, 0x0}, {{{0xa4ec, 0x5, 0x17a2}, {0x0, {0xc03e586ca8, 0x12, 0x18}, ...}, ...}, ...})
github.com/pingcap/tiflow/cdc/kv/client.go:558 +0x177
github.com/pingcap/tiflow/cdc/kv.(*regionWorker).evictAllRegions(0xc09600ae10)
github.com/pingcap/tiflow/cdc/kv/region_worker.go:825 +0x25b
github.com/pingcap/tiflow/cdc/kv.(*eventFeedSession).receiveFromStream(0xc03e6ba6e0, {0x49d4aa8?, 0xc03e6c27d0?}, 0xc057872f38?, {0xc00077bda0, 0x51}, 0xd, {0x49ecc18, 0xc0814bca00}, 0xc08153c880)
github.com/pingcap/tiflow/cdc/kv/client.go:1072 +0x13fd
github.com/pingcap/tiflow/cdc/kv.(*eventFeedSession).requestRegionToStore.func2()
github.com/pingcap/tiflow/cdc/kv/client.go:660 +0xb2
golang.org/x/sync/errgroup.(*Group).Go.func1()
golang.org/x/[email protected]/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
golang.org/x/[email protected]/errgroup/errgroup.go:72 +0xa5
Versions of the cluster
[root@upstream-ticdc-0 /]# /cdc version Release Version: v7.3.0-alpha Git Commit Hash: 567d0a61b5653a30e620f35d4adbf455ee8426b3 Git Branch: heads/refs/tags/v7.3.0-alpha UTC Build Time: 2023-07-05 11:03:10 Go Version: go version go1.20.5 linux/amd64 Failpoint Build: false
/severity major
/assign @hicqu
The related code is outdate and removed in v8.1.0 and v7.5.0, so I remove the tag affects-7.5
and affects-8.1
.
This issue is quite rare, so I've adjusted its severity to moderate. cc @fubinzh @flowbehappy
Close since not reporducible