matrixone
matrixone copied to clipboard
[Bug]: Load data from cos report 'stream closed'.
Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
Branch Name
main
Commit ID
4170547615910e61bc5a4ae8e950fe4097703256
Other Environment Information
- Hardware parameters:
- OS type:
- Others:
Actual Behavior
job url:(load and insert test: load pk index 100M、load pk index 1B) https://github.com/matrixorigin/mo-nightly-regression/actions/runs/7690791843/job/20955350330
log:http://175.178.192.213:30088/explore?panes=%7B%22AAL%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22branch-big-data-nightly-4170547%5C%22%7D%20%7C%3D%20%60stream%20closed%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%22now-24h%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1&orgId=1
Expected Behavior
load data seccess.
Steps to Reproduce
table ddl:
create table if not exists big_data_test.table_with_pk_index_for_load_100M(
id bigint primary key,
col1 tinyint,
col2 smallint,
col3 int,
col4 bigint,
col5 tinyint unsigned,
col6 smallint unsigned,
col7 int unsigned,
col8 bigint unsigned,
col9 float,
col10 double,
col11 varchar(255),
col12 Date,
col13 DateTime,
col14 timestamp,
col15 bool,
col16 decimal(16,6),
col17 text,
col18 json,
col19 blob,
col20 binary(255),
col21 varbinary(255),
col22 vecf32(3),
col23 vecf32(3),
col24 vecf64(3),
col25 vecf64(3),
key(col3),
unique key(col4)
);
create table if not exists big_data_test.table_with_pk_index_for_load_1B(
id bigint primary key,
col1 tinyint,
col2 smallint,
col3 int,
col4 bigint,
col5 tinyint unsigned,
col6 smallint unsigned,
col7 int unsigned,
col8 bigint unsigned,
col9 float,
col10 double,
col11 varchar(255),
col12 Date,
col13 DateTime,
col14 timestamp,
col15 bool,
col16 decimal(16,6),
col17 text,
col18 json,
col19 blob,
col20 binary(255),
col21 varbinary(255),
col22 vecf32(3),
col23 vecf32(3),
col24 vecf64(3),
col25 vecf64(3),
key(col3),
unique key(col4)
);
Additional information
No response
Could you please kindly help take a look? Thanks. @daviszhen
未投入
未投入
未投入
未投入
未投入
未投入
loki上的日志: http://175.178.192.213:30088/explore?panes=%7B%22AAL%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22branch-big-data-nightly-4170547%5C%22%7D%20%21%3D%20%60delete:%20s3:%2F%2Fmo-nightly-gz%60%20%21%3D%20%60cron%20task%20scheduler%20stopped%20or%20is%20stopping%60%20%21%3D%20%60%21%21%21COM_QUIT%21%21%21%60%20%21%3D%20%60ms%20cpu%60%20%21%3D%20%60blockio%2Fpipeline.go%60%20%21%3D%20%60set%20query%20status%20on%20the%20connection%60%20%21%3D%20%60task%2Ftask_scheduler.go%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22,%22maxLines%22:5000%7D%5D,%22range%22:%7B%22from%22:%221706500800000%22,%22to%22:%221706502600000%22%7D%7D%7D&schemaVersion=1&orgId=1
无进展
最近几次没再出现了,先降级跟踪
【0411】
job:https://github.com/matrixorigin/mo-nightly-regression/actions/runs/8633046991/job/23688353616
日志:https://grafana.ci.matrixorigin.cn/explore?panes=%7B%224Wz%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-nightly-regression-20240410%5C%22%7D%20%7C%3D%20%60stream%20closed%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221712805561000%22,%22to%22:%221712805681000%22%7D%7D%7D&schemaVersion=1&orgId=1
add some logs to help investigate.
等 https://github.com/matrixorigin/matrixone/pull/15448 合并进去
等 https://github.com/matrixorigin/matrixone/pull/15448 合并进去
等 https://github.com/matrixorigin/matrixone/pull/15448 合并进去
需要再问一下张旭为什么pr关掉了
[0430]大数据测试又出现了这个问题:
log: https://grafana.ci.matrixorigin.cn/explore?panes=%7B%227jw%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-20240430%5C%22%7D%20%7C%3D%20%60stream%20closed%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%22now-7d%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1&orgId=1
[0508]
commit: 6b1a10d62ec53a54394120d8c6327c7886c1ce15
job url: https://github.com/matrixorigin/mo-nightly-regression/actions/runs/8995012876/job/24709777906
log: https://grafana.ci.matrixorigin.cn/explore?panes=%7B%227jw%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-20240508%5C%22%7D%20%7C%3D%20%60stream%20closed%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%22now-2d%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1&orgId=1
日志中看到logtail stream closed 信息,看日志是因为tn在31分16秒是发生了重启: https://grafana.ci.matrixorigin.cn/explore?panes=%7B%227jw%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-20240508%5C%22,%20matrixorigin_io_component%3D%5C%22DNSet%5C%22%7D%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22,%22maxLines%22:5000%7D%5D,%22range%22:%7B%22from%22:%221715167873000%22,%22to%22:%221715167923000%22%7D%7D%7D&schemaVersion=1&orgId=1
昨天没出现这个问题了。在跑几天看看 job url: https://github.com/matrixorigin/mo-nightly-regression/actions/runs/9062078787/job/24928171270
dn oom的问题已经被解决了,现在需要再跑几天看看会不会再出类似的问题。
目前没有进展
观察一段时间
最新一次结果:
https://github.com/matrixorigin/mo-nightly-regression/actions/runs/9301617718/job/25620006001
没再出现了,关掉