matrixone
matrixone copied to clipboard
[Bug]: [date 3.10]tke regression: sysbench 1000w delete/update auto_increment index reported stream closed
Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
Branch Name
main
Commit ID
15af2cf1a
Other Environment Information
- Hardware parameters:
- OS type:
- Others:
Actual Behavior
job:https://github.com/matrixorigin/mo-nightly-regression/actions/runs/8222697705/job/22484639356
sysbench1000w delete/update测试schema 包含自增列和index(date3.10第一次测试),之前的流程是不包含自增列和索引
log:
http://175.178.192.213:30088/explore?panes=%7B%22IeX%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-nightly-regression-20240310%5C%22%7D%20%7C%3D%20%60stream%20closed%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221710118800000%22,%22to%22:%221710125999000%22%7D%7D%7D&schemaVersion=1&orgId=1
有大量的use of closed network connection,定位下是否是这个原因导致stream closed,是否符合预期
Expected Behavior
No response
Steps to Reproduce
tke regression sysbench1000w delete/update测试
Additional information
No response
date 3.11 regression delete/update 也出现该问题
job:https://github.com/matrixorigin/mo-nightly-regression/actions/runs/8234567942/job/22539827858
其他sysbench场景也出现了
log:
也是大量报use of closed network connection
http://175.178.192.213:30088/explore?panes=%7B%227iC%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-nightly-regression-20240311%5C%22%7D%20%7C%3D%20%60%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221710209640000%22,%22to%22:%221710209700000%22%7D%7D%7D&schemaVersion=1&orgId=1
1、发生的直接原因是:心跳检测超时了。 2、心跳检测超时的原因是rpc消费队列插不进去,等待时间太长了。(超过40秒,这里设置的是10秒超时)
跟莫尘、张旭讨论先临时解决: 把rpc等待超时的时间延长到120.
未来最终的方案可能是: 1、找到为什么阻塞(是执行慢还是IO打满了...,找到对应的处理方案) 2、心跳检测分离开来
date 3.13,#14947pr合进去后还有该问题
https://github.com/matrixorigin/mo-nightly-regression/actions/runs/8266410811/job/22634292651
有降低了概率,但是还有问题。还没时间继续挖
tke sysbench 1000w delete tps only 5
在处理prepare重构
可能张旭的 https://github.com/matrixorigin/matrixone/pull/15181 这个PR解决了这个问题
最近回归没有出现stream closed,但rpc阻塞问题还是有,先降级为s1
辛苦张旭帮忙处理一下
明松和nitao的pr已经解决了提早结束的问题。看看还有没有。目前还没有投入去看
Closing this issue due to inactivity. Feel free to reopen or create a new issue if needed. Thanks!