matrixone
matrixone copied to clipboard
[Bug]: Load data daily regression on tke report 'stream closed'.
Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
Branch Name
main
Commit ID
b72bea5f640f6d07fbabc5437c7d64f48d020be5
Other Environment Information
- Hardware parameters:
- OS type:
- Others:
Actual Behavior
job url: https://github.com/matrixorigin/mo-nightly-regression/actions/runs/7462446702/job/20317926227
mo-log:http://175.178.192.213:30088/explore?panes=%7B%22LwR%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-nightly-regression-20240109%5C%22%7D%20%7C%3D%20%60stream%20closed%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%22now-24h%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1&orgId=1
二分出问题的commit:594ad8b365da45fb27f755489f6bd01093ac1a7a
Expected Behavior
No response
Steps to Reproduce
daily regression on tke.
Additional information
No response
又跑了两遍,并没有出现这个问题。成功跑完load的流程。go version为1.21.5,os为ubuntu 22.04.3
https://github.com/matrixorigin/mo-nightly-regression/actions/runs/7473592676
https://github.com/matrixorigin/mo-nightly-regression/actions/runs/7474357006/job/20340694720
然后观察上午binary search时启动的mo集群的日志,发现在失败的时候一个cn pod发生了段错误
日志:http://175.178.192.213:30088/explore?panes=%7B%22LwR%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-nightly-regression-20240110%5C%22,%20pod%3D%5C%22nightly-regression-dis-tp-cn-2lfjb%5C%22%7D%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221704864342196%22,%22to%22:%221704864559196%22%7D%7D%7D&schemaVersion=1&orgId=1
@nnsgmsone 帮忙看看呢
段错误的话。需要daily每次运行的时候都把core dump开启一下,这样遇到端错误就可以直接查看core dump来查看问题了。 @Ariznawlll @guguducken 可以开启一下core dump吗
等待复现后的core dump
等待复现后的core dump
等待复现后的core dump
等待复现后的core dump
等待复现后的core dump
no process
no process
处理数据正确性问题中
no process
no process
no process
no process
no process
It hasn't been reproduced for a long time; closed.