matrixone icon indicating copy to clipboard operation
matrixone copied to clipboard

[Bug]: Load data from cos report 'stream closed'.

Open Ariznawlll opened this issue 1 year ago • 10 comments

Is there an existing issue for the same bug?

  • [X] I have checked the existing issues.

Branch Name

main

Commit ID

4170547615910e61bc5a4ae8e950fe4097703256

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

job url:(load and insert test: load pk index 100M、load pk index 1B) https://github.com/matrixorigin/mo-nightly-regression/actions/runs/7690791843/job/20955350330

image image

log:http://175.178.192.213:30088/explore?panes=%7B%22AAL%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22branch-big-data-nightly-4170547%5C%22%7D%20%7C%3D%20%60stream%20closed%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%22now-24h%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1&orgId=1

Expected Behavior

load data seccess.

Steps to Reproduce

table ddl:

create table  if not exists big_data_test.table_with_pk_index_for_load_100M(
id bigint primary key,
col1 tinyint,
col2 smallint,
col3 int,
col4 bigint,
col5 tinyint unsigned,
col6 smallint unsigned,
col7 int unsigned,
col8 bigint unsigned,
col9 float,
col10 double,
col11 varchar(255),
col12 Date,
col13 DateTime,
col14 timestamp,
col15 bool,
col16 decimal(16,6),
col17 text,
col18 json,
col19 blob,
col20 binary(255),
col21 varbinary(255),
col22 vecf32(3),
col23 vecf32(3),
col24 vecf64(3),
col25 vecf64(3),
key(col3),
unique key(col4)
);

create table  if not exists big_data_test.table_with_pk_index_for_load_1B(
id bigint primary key,
col1 tinyint,
col2 smallint,
col3 int,
col4 bigint,
col5 tinyint unsigned,
col6 smallint unsigned,
col7 int unsigned,
col8 bigint unsigned,
col9 float,
col10 double,
col11 varchar(255),
col12 Date,
col13 DateTime,
col14 timestamp,
col15 bool,
col16 decimal(16,6),
col17 text,
col18 json,
col19 blob,
col20 binary(255),
col21 varbinary(255),
col22 vecf32(3),
col23 vecf32(3),
col24 vecf64(3),
col25 vecf64(3),
key(col3),
unique key(col4)
);

Additional information

No response

Ariznawlll avatar Jan 30 '24 02:01 Ariznawlll

Could you please kindly help take a look? Thanks. @daviszhen

aronchanisme avatar Jan 30 '24 11:01 aronchanisme

未投入

daviszhen avatar Feb 02 '24 11:02 daviszhen

未投入

daviszhen avatar Feb 07 '24 10:02 daviszhen

未投入

daviszhen avatar Feb 21 '24 13:02 daviszhen

未投入

daviszhen avatar Feb 26 '24 12:02 daviszhen

未投入

daviszhen avatar Feb 29 '24 11:02 daviszhen

未投入

daviszhen avatar Mar 05 '24 13:03 daviszhen

loki上的日志: http://175.178.192.213:30088/explore?panes=%7B%22AAL%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22branch-big-data-nightly-4170547%5C%22%7D%20%21%3D%20%60delete:%20s3:%2F%2Fmo-nightly-gz%60%20%21%3D%20%60cron%20task%20scheduler%20stopped%20or%20is%20stopping%60%20%21%3D%20%60%21%21%21COM_QUIT%21%21%21%60%20%21%3D%20%60ms%20cpu%60%20%21%3D%20%60blockio%2Fpipeline.go%60%20%21%3D%20%60set%20query%20status%20on%20the%20connection%60%20%21%3D%20%60task%2Ftask_scheduler.go%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22,%22maxLines%22:5000%7D%5D,%22range%22:%7B%22from%22:%221706500800000%22,%22to%22:%221706502600000%22%7D%7D%7D&schemaVersion=1&orgId=1

volgariver6 avatar Mar 08 '24 09:03 volgariver6

无进展

volgariver6 avatar Mar 13 '24 10:03 volgariver6

最近几次没再出现了,先降级跟踪

Ariznawlll avatar Mar 13 '24 10:03 Ariznawlll

【0411】

job:https://github.com/matrixorigin/mo-nightly-regression/actions/runs/8633046991/job/23688353616

image

日志:https://grafana.ci.matrixorigin.cn/explore?panes=%7B%224Wz%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-nightly-regression-20240410%5C%22%7D%20%7C%3D%20%60stream%20closed%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221712805561000%22,%22to%22:%221712805681000%22%7D%7D%7D&schemaVersion=1&orgId=1

Ariznawlll avatar Apr 11 '24 06:04 Ariznawlll

add some logs to help investigate.

volgariver6 avatar Apr 12 '24 13:04 volgariver6

等 https://github.com/matrixorigin/matrixone/pull/15448 合并进去

volgariver6 avatar Apr 17 '24 10:04 volgariver6

等 https://github.com/matrixorigin/matrixone/pull/15448 合并进去

volgariver6 avatar Apr 22 '24 10:04 volgariver6

等 https://github.com/matrixorigin/matrixone/pull/15448 合并进去

volgariver6 avatar Apr 25 '24 12:04 volgariver6

需要再问一下张旭为什么pr关掉了

volgariver6 avatar Apr 30 '24 13:04 volgariver6

[0430]大数据测试又出现了这个问题: image log: https://grafana.ci.matrixorigin.cn/explore?panes=%7B%227jw%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-20240430%5C%22%7D%20%7C%3D%20%60stream%20closed%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%22now-7d%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1&orgId=1

Ariznawlll avatar May 06 '24 02:05 Ariznawlll

[0508] commit: 6b1a10d62ec53a54394120d8c6327c7886c1ce15 job url: https://github.com/matrixorigin/mo-nightly-regression/actions/runs/8995012876/job/24709777906 image log: https://grafana.ci.matrixorigin.cn/explore?panes=%7B%227jw%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-20240508%5C%22%7D%20%7C%3D%20%60stream%20closed%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%22now-2d%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1&orgId=1

Ariznawlll avatar May 09 '24 02:05 Ariznawlll

日志中看到logtail stream closed 信息,看日志是因为tn在31分16秒是发生了重启: https://grafana.ci.matrixorigin.cn/explore?panes=%7B%227jw%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-20240508%5C%22,%20matrixorigin_io_component%3D%5C%22DNSet%5C%22%7D%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22,%22maxLines%22:5000%7D%5D,%22range%22:%7B%22from%22:%221715167873000%22,%22to%22:%221715167923000%22%7D%7D%7D&schemaVersion=1&orgId=1

volgariver6 avatar May 09 '24 03:05 volgariver6

昨天没出现这个问题了。在跑几天看看 job url: https://github.com/matrixorigin/mo-nightly-regression/actions/runs/9062078787/job/24928171270

Ariznawlll avatar May 14 '24 03:05 Ariznawlll

dn oom的问题已经被解决了,现在需要再跑几天看看会不会再出类似的问题。

w-zr avatar May 19 '24 10:05 w-zr

目前没有进展

w-zr avatar May 23 '24 10:05 w-zr

观察一段时间

Ariznawlll avatar May 28 '24 10:05 Ariznawlll

最新一次结果: https://github.com/matrixorigin/mo-nightly-regression/actions/runs/9301617718/job/25620006001 image 没再出现了,关掉

Ariznawlll avatar May 31 '24 02:05 Ariznawlll