tiflow icon indicating copy to clipboard operation
tiflow copied to clipboard

lots of logs and cdc panic during tikv rolling restart

Open fubinzh opened this issue 1 year ago • 4 comments

What did you do?

  1. pd.enable-forwarding = true is configured
  2. trigger tikv rolling restart by edit tikv spec (before that, tikv-2 was pending for 10+ hours due to k8s scheduler issue)

What did you expect to see?

  1. cdc should not panic

What did you see instead?

lots of cdc logs seen in short time about 300MB logs every minutes, until tikv rolling restart finishes. And cdc panic seen

[root@upstream-ticdc-0 log]# du -shl *
301M    ticdc-2023-07-06T07-11-53.148.log
300M    ticdc-2023-07-06T07-12-06.878.log
300M    ticdc-2023-07-06T07-12-19.253.log
300M    ticdc-2023-07-07T01-25-27.927.log
300M    ticdc-2023-07-07T01-26-24.099.log
300M    ticdc-2023-07-07T01-27-05.064.log
300M    ticdc-2023-07-07T01-27-54.467.log
300M    ticdc-2023-07-07T01-28-44.971.log
300M    ticdc-2023-07-07T01-29-39.447.log
300M    ticdc-2023-07-07T01-30-34.204.log
300M    ticdc-2023-07-07T01-31-22.034.log
300M    ticdc-2023-07-07T01-31-33.798.log
301M    ticdc-2023-07-07T01-32-49.720.log
13M     ticdc.log

[root@bogon ticdc]# kubectl --kubeconfig kubeconfig.yml -n cdc-testbed-airbnb-tps-1814881-1-541 logs -p upstream-ticdc-0
[WARN] TiCDC server data-dir is not set. Please use `cdc server --data-dir` to start the cdc server if possible.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x310ae9e]

goroutine 170897871 [running]:
github.com/pingcap/tiflow/cdc/kv.(*eventFeedSession).enqueueError(0xc005ea6160, {0x0?, 0x0?}, {{{0x6cac, 0x5, 0x1850}, {0x0, {0xc094b39f60, 0x1b, 0x20}, ...}, ...}, ...})
        github.com/pingcap/tiflow/cdc/kv/client.go:875 +0x7e
github.com/pingcap/tiflow/cdc/kv.(*eventFeedSession).onRegionFail(0xc005ea6160, {0x0, 0x0}, {{{0x6cac, 0x5, 0x1850}, {0x0, {0xc094b39f60, 0x1b, 0x20}, ...}, ...}, ...})
        github.com/pingcap/tiflow/cdc/kv/client.go:558 +0x177
github.com/pingcap/tiflow/cdc/kv.(*regionWorker).evictAllRegions(0xc0965e4360)
        github.com/pingcap/tiflow/cdc/kv/region_worker.go:825 +0x25b
github.com/pingcap/tiflow/cdc/kv.(*eventFeedSession).receiveFromStream(0xc005ea6160, {0x49d4aa8?, 0xc01d10ecd0?}, 0xc0a67f2410?, {0xc002d40ba0, 0x51}, 0xd, {0x49ecc18, 0xc095d21bc0}, 0xc0d38b6700)
        github.com/pingcap/tiflow/cdc/kv/client.go:1072 +0x13fd
github.com/pingcap/tiflow/cdc/kv.(*eventFeedSession).requestRegionToStore.func2()
        github.com/pingcap/tiflow/cdc/kv/client.go:660 +0xb2
golang.org/x/sync/errgroup.(*Group).Go.func1()
        golang.org/x/[email protected]/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
        golang.org/x/[email protected]/errgroup/errgroup.go:72 +0xa5
[root@bogon ticdc]# kubectl --kubeconfig kubeconfig.yml -n cdc-testbed-airbnb-tps-1814881-1-541 logs -p upstream-ticdc-1
[WARN] TiCDC server data-dir is not set. Please use `cdc server --data-dir` to start the cdc server if possible.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x310ae9e]

goroutine 13788899 [running]:
github.com/pingcap/tiflow/cdc/kv.(*eventFeedSession).enqueueError(0xc03e6ba6e0, {0x0?, 0x0?}, {{{0xa4ec, 0x5, 0x17a2}, {0x0, {0xc03e586ca8, 0x12, 0x18}, ...}, ...}, ...})
        github.com/pingcap/tiflow/cdc/kv/client.go:875 +0x7e
github.com/pingcap/tiflow/cdc/kv.(*eventFeedSession).onRegionFail(0xc03e6ba6e0, {0x0, 0x0}, {{{0xa4ec, 0x5, 0x17a2}, {0x0, {0xc03e586ca8, 0x12, 0x18}, ...}, ...}, ...})
        github.com/pingcap/tiflow/cdc/kv/client.go:558 +0x177
github.com/pingcap/tiflow/cdc/kv.(*regionWorker).evictAllRegions(0xc09600ae10)
        github.com/pingcap/tiflow/cdc/kv/region_worker.go:825 +0x25b
github.com/pingcap/tiflow/cdc/kv.(*eventFeedSession).receiveFromStream(0xc03e6ba6e0, {0x49d4aa8?, 0xc03e6c27d0?}, 0xc057872f38?, {0xc00077bda0, 0x51}, 0xd, {0x49ecc18, 0xc0814bca00}, 0xc08153c880)
        github.com/pingcap/tiflow/cdc/kv/client.go:1072 +0x13fd
github.com/pingcap/tiflow/cdc/kv.(*eventFeedSession).requestRegionToStore.func2()
        github.com/pingcap/tiflow/cdc/kv/client.go:660 +0xb2
golang.org/x/sync/errgroup.(*Group).Go.func1()
        golang.org/x/[email protected]/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
        golang.org/x/[email protected]/errgroup/errgroup.go:72 +0xa5

image image

Versions of the cluster

[root@upstream-ticdc-0 /]# /cdc version Release Version: v7.3.0-alpha Git Commit Hash: 567d0a61b5653a30e620f35d4adbf455ee8426b3 Git Branch: heads/refs/tags/v7.3.0-alpha UTC Build Time: 2023-07-05 11:03:10 Go Version: go version go1.20.5 linux/amd64 Failpoint Build: false

fubinzh avatar Jul 07 '23 01:07 fubinzh

/severity major

fubinzh avatar Jul 07 '23 02:07 fubinzh

/assign @hicqu

nongfushanquan avatar Jul 14 '23 02:07 nongfushanquan

The related code is outdate and removed in v8.1.0 and v7.5.0, so I remove the tag affects-7.5 and affects-8.1.

asddongmen avatar Apr 26 '24 10:04 asddongmen

This issue is quite rare, so I've adjusted its severity to moderate. cc @fubinzh @flowbehappy

asddongmen avatar Apr 26 '24 10:04 asddongmen

Close since not reporducible

flowbehappy avatar May 07 '24 10:05 flowbehappy