[Bug]: mo can not connect for error "internal error: no available CN server" during choas test (kill one dn pod continously interval 10 minutes)
Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
Branch Name
2.0-dev
Commit ID
410a540
Other Environment Information
- Hardware parameters:
3*CN: 7C 28G
1*DN: 7C 28G
3*PROXY: 2C 5G
3*LOG: 1C 7G
- OS type:
- Others:
Actual Behavior
[test load] run tpcc 10-10 insert data to a table with 2 thread and during the test, the chaos tool were continuously kill one tn pod by interval 10 mins
[issue] after about 3 hours, mo can not connect for error "internal error: no available CN server : [github@mo-srv-128 stability-test]$ mysql -h 10.222.6.253 -utpcc_test:admin -p111 -P6001 mysql: [Warning] Using a password on the command line interface can be insecure. ERROR 20101 (HY000): internal error: no available CN server
mo-log: https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22NUU%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-chaos-bba26ea-202501081143%5C%22%7D%20%7C%3D%20%60%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221736314556557%22,%22to%22:%221736336101600%22%7D%7D%7D&schemaVersion=1&orgId=1
TN goroutine: goroutine.log
cn goroutine: CN_31306230-3661-6263-3235-613739373131_goroutine_0194455d-dca2-780f-b049-5cc5f4c33365.gz CN_31306230-3661-6263-3235-613739373131_goroutine_0194455d-6774-7dc0-b12b-4c8b8cd46ec5.gz
Expected Behavior
No response
Steps to Reproduce
[test load]
run tpcc 10-10
insert data to a table with 2 thread
and during the test, the chaos tool were continuously kill one log pod by interval 10 mins
Additional information
No response
@badboynt1
@ouyuanning
今天跑了1次20分钟。但是间隔10秒就kill tn的。碰到了cn panic的问题。已经另外建issue跟踪
另外跑了1次1个小时的。间隔80秒kill tn。然后间隔5秒重新启动tn。没有复现
今天用最新的2.0-dev没有跑出来。 用issue提到的commit跑出来的问题,经确认是已fixed的问题
没有复现
非REGRESSION问题,DELAY 到后续版本解决
repro on 2025.01.15
mo-log: https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22aEr%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-chaos-2f6f3d7-202501150026%5C%22%7D%20%7C%3D%20%60%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221736887874693%22,%22to%22:%221736909437995%22%7D%7D%7D&schemaVersion=1&orgId=1
goroutine: CN_32343662-3437-3332-6363-316364383262_goroutine_019467da-e89b-742b-8cef-859a22319cdf.gz CN_32343662-3437-3332-6363-316364383262_goroutine_019467dc-48c1-7c0c-a364-11583bbaa27d.gz CN_32343662-3437-3332-6363-316364383262_goroutine_019467db-d19f-7b58-988f-8e9e31f2bd4e.gz CN_32343662-3437-3332-6363-316364383262_goroutine_019467db-5c49-7497-a4d6-d6c3ae2c75a8.gz
https://github.com/matrixorigin/matrixone/pull/21279 合并后。 当前问题依然存在。估计跟4771不是一个问题
请年假了
请年假了
should be fixed by https://github.com/matrixorigin/matrixone/pull/21692
最新2.2版本故障测试已无该问题,先closed