tiflow icon indicating copy to clipboard operation
tiflow copied to clipboard

cdc panic when scale out TiKV

Open fubinzh opened this issue 1 year ago • 11 comments

What did you do?

  1. TiDB cluster with 6 TiKV, one kafka simple protocol changefeed running
  2. scale TiKV from 6 to 7

What did you expect to see?

CDC should not impacted by TiKV scale

What did you see instead?

CDC panic during TiKV scale.

2024-02-27 19:51:48	
{"log":"[helper.go:54] [\"init log\"] [file=/var/lib/ticdc/log/ticdc.log] [level=info]","container":"ticdc","pod":"tc-ticdc-1","level":"INFO","namespace":"cdcsimple-tps-7110011-1-567"}
2024-02-27 19:51:47	
{"pod":"tc-ticdc-1","container":"ticdc","log":"2024-02-27T19:51:47.016815463+08:00 stderr F \tgolang.org/x/[email protected]/errgroup/errgroup.go:75 +0x96","namespace":"cdcsimple-tps-7110011-1-567"}
2024-02-27 19:51:47	
{"pod":"tc-ticdc-1","container":"ticdc","log":"2024-02-27T19:51:47.016813476+08:00 stderr F created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1300","namespace":"cdcsimple-tps-7110011-1-567"}
2024-02-27 19:51:47	
{"pod":"tc-ticdc-1","container":"ticdc","log":"2024-02-27T19:51:47.016811683+08:00 stderr F \tgolang.org/x/[email protected]/errgroup/errgroup.go:78 +0x56","namespace":"cdcsimple-tps-7110011-1-567"}
2024-02-27 19:51:47	
{"pod":"tc-ticdc-1","container":"ticdc","log":"2024-02-27T19:51:47.016809799+08:00 stderr F golang.org/x/sync/errgroup.(*Group).Go.func1()","namespace":"cdcsimple-tps-7110011-1-567"}
2024-02-27 19:51:47	
{"pod":"tc-ticdc-1","container":"ticdc","log":"2024-02-27T19:51:47.016807559+08:00 stderr F \tgithub.com/pingcap/tiflow/cdc/kv/shared_stream.go:93 +0xa7","namespace":"cdcsimple-tps-7110011-1-567"}
2024-02-27 19:51:47	
{"pod":"tc-ticdc-1","container":"ticdc","log":"2024-02-27T19:51:47.016805692+08:00 stderr F github.com/pingcap/tiflow/cdc/kv.newStream.func2()","namespace":"cdcsimple-tps-7110011-1-567"}
2024-02-27 19:51:47	
{"pod":"tc-ticdc-1","container":"ticdc","log":"2024-02-27T19:51:47.016803668+08:00 stderr F \tgithub.com/pingcap/tiflow/cdc/kv/shared_stream.go:138 +0x85","namespace":"cdcsimple-tps-7110011-1-567"}
2024-02-27 19:51:47	
{"pod":"tc-ticdc-1","container":"ticdc","log":"2024-02-27T19:51:47.0167954+08:00 stderr F github.com/pingcap/tiflow/cdc/kv.(*requestedStream).run(0xc0052278b0, {0x5bc4dc0?, 0xc004caa820}, 0xc004c3af20, 0xc005217e00)","namespace":"cdcsimple-tps-7110011-1-567"}
2024-02-27 19:51:47	
{"pod":"tc-ticdc-1","container":"ticdc","log":"2024-02-27T19:51:47.016792711+08:00 stderr F \tgithub.com/pingcap/tiflow/pkg/version/check.go:213 +0x136","namespace":"cdcsimple-tps-7110011-1-567"}
2024-02-27 19:51:47	
{"pod":"tc-ticdc-1","container":"ticdc","log":"2024-02-27T19:51:47.016789936+08:00 stderr F github.com/pingcap/tiflow/pkg/version.CheckStoreVersion({0x5bc4dc0?, 0xc004caa820?}, {0x5c4f788?, 0xc00364c840?}, 0x44?)","namespace":"cdcsimple-tps-7110011-1-567"}
2024-02-27 19:51:47	
{"pod":"tc-ticdc-1","container":"ticdc","log":"2024-02-27T19:51:47.016786133+08:00 stderr F \tgithub.com/pingcap/[email protected]/pkg/util/engine/engine.go:24","namespace":"cdcsimple-tps-7110011-1-567"}
2024-02-27 19:51:47	
{"pod":"tc-ticdc-1","container":"ticdc","log":"2024-02-27T19:51:47.01678319+08:00 stderr F github.com/pingcap/tidb/pkg/util/engine.IsTiFlash(...)","namespace":"cdcsimple-tps-7110011-1-567"}
2024-02-27 19:51:47	
{"pod":"tc-ticdc-1","container":"ticdc","log":"2024-02-27T19:51:47.016780739+08:00 stderr F goroutine 1320 [running]:","namespace":"cdcsimple-tps-7110011-1-567"}
2024-02-27 19:51:47	
{"pod":"tc-ticdc-1","container":"ticdc","log":"2024-02-27T19:51:47.016778282+08:00 stderr F ","namespace":"cdcsimple-tps-7110011-1-567"}
2024-02-27 19:51:47	
{"pod":"tc-ticdc-1","container":"ticdc","log":"2024-02-27T19:51:47.01677508+08:00 stderr F [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1fd3c56]","namespace":"cdcsimple-tps-7110011-1-567"}
2024-02-27 19:51:47	
{"pod":"tc-ticdc-1","container":"ticdc","log":"2024-02-27T19:51:47.016745028+08:00 stderr F panic: runtime error: invalid memory address or nil pointer dereference","namespace":"cdcsimple-tps-7110011-1-567"}

Versions of the cluster

cdc version: [release-version=v8.0.0] [git-hash=a288b3681f196cb07bb8fdaf041ab79dff93bc87] [git-branch=heads/refs/tags/v8.0.0] [utc-build-time="2024-02-27 05:58:25"] [go-version="go version go1.21.6 linux/amd64"]

fubinzh avatar Feb 28 '24 06:02 fubinzh

/severity major

fubinzh avatar Feb 28 '24 06:02 fubinzh

Cannot reproduce it locally, submit a pr https://github.com/pingcap/tiflow/pull/10681 to add some check to avoid this kinds of panic.

lidezhu avatar Feb 29 '24 10:02 lidezhu

/remove-label may-affects-5.4

lidezhu avatar Feb 29 '24 10:02 lidezhu

/remove-label may-affects-6.1

lidezhu avatar Feb 29 '24 10:02 lidezhu

/remove-label may-affects-6.5

lidezhu avatar Feb 29 '24 10:02 lidezhu

/remove-label may-affects-7.1

lidezhu avatar Feb 29 '24 10:02 lidezhu

/remove-label may-affects-7.5

lidezhu avatar Feb 29 '24 10:02 lidezhu

/severity moderate

lidezhu avatar Feb 29 '24 10:02 lidezhu

/severity-remove major

lidezhu avatar Feb 29 '24 10:02 lidezhu

/assign @lidezhu

lidezhu avatar Feb 29 '24 10:02 lidezhu

/assign @lidezhu

lidezhu avatar Feb 29 '24 10:02 lidezhu

Since the issue cannot be reproduced and there isn't sufficient information to determine the root cause, this case will be closed.

Please note that we are persistently improving the kvClient's code.

If the issue arises again, please provide more information and reopen this case. Feel free to contact me for further assistance.

asddongmen avatar May 21 '24 04:05 asddongmen