matrixone
matrixone copied to clipboard
[Bug]: 执行报错Error 1105 (HY000): write tcp4 172.20.47.3:6002->172.20.26.4:59238: use of closed network connection
Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
Environment
- Version or commit-id (e.g. v0.1.0 or 8b23a93):d17a04928bd4486a3772eeb7bd55781b8c4ef3ce
- Hardware parameters:
- OS type:
- Others:
Actual Behavior
在SQL执行的过程中,报错Error 1105 (HY000): write tcp4 172.20.47.3:6002->172.20.26.4:59238: use of closed network connection。 SQL为:/* cloud_user */\n\n-- Q8\nselect\n o_year,\n (sum(case\n when nation = 'ARGENTINA' then volume\n else 0\n end) / sum(volume)) as mkt_share\nfrom\n (\n select\n extract(year from o_orderdate) as o_year,\n l_extendedprice * (1 - l_discount) as volume,\n n2.n_name as nation\n from\n part,\n supplier,\n lineitem,\n orders,\n customer,\n nation n1,\n nation n2,\n region\n where\n p_partkey = l_partkey\n and s_suppkey = l_suppkey\n and l_orderkey = o_orderkey\n and o_custkey = c_custkey\n and c_nationkey = n1.n_nationkey\n and n1.n_regionkey = r_regionkey\n and r_name = 'AMERICA'\n and s_nationkey = n2.n_nationkey\n and o_orderdate between date '1995-01-01' and date '1996-12-31'\n and p_type = 'ECONOMY BURNISHED TIN'\n ) as all_nations\ngroup by\n o_year\norder by\n o_year\n;
Expected Behavior
正常执行和返回。
Steps to Reproduce
非稳定复现,偶发。
Additional information
报错日志:2023/11/07 12:38:25.829377 +0000 ERROR [email protected]/finisher_api.go:610 trace {"error": "Error 1105 (HY000): write tcp4 172.20.47.3:6002->172.20.26.4:59238: use of closed network connection", "elapsed": "42.233536378s", "rows": 0, "sql": "/* cloud_user */\n\n-- Q8\nselect\n o_year,\n (sum(case\n when nation = 'ARGENTINA' then volume\n else 0\n end) / sum(volume)) as mkt_share\nfrom\n (\n select\n extract(year from o_orderdate) as o_year,\n l_extendedprice * (1 - l_discount) as volume,\n n2.n_name as nation\n from\n part,\n supplier,\n lineitem,\n orders,\n customer,\n nation n1,\n nation n2,\n region\n where\n p_partkey = l_partkey\n and s_suppkey = l_suppkey\n and l_orderkey = o_orderkey\n and o_custkey = c_custkey\n and c_nationkey = n1.n_nationkey\n and n1.n_regionkey = r_regionkey\n and r_name = 'AMERICA'\n and s_nationkey = n2.n_nationkey\n and o_orderdate between date '1995-01-01' and date '1996-12-31'\n and p_type = 'ECONOMY BURNISHED TIN'\n ) as all_nations\ngroup by\n o_year\norder by\n o_year\n;"}
这个很难复现,找到对应的场景,建议降级到s1
no progress
@volgariver6 这个现在 dev 环境查 SQL history 很容易就出现
SELECT * FROM (select `statement`,system.statement_info.statement_id,IF(`status`='Running', TIMESTAMPDIFF(MICROSECOND,`request_at`,now())*1000, `duration`) AS `duration`,`status`,`query_type`,`request_at`,system.statement_info.response_at,`user`,`database`,`transaction_id`,`session_id`,`rows_read`,`bytes_scan`,`result_count`,IF(status = 'Running', NULL, IF(cu IS NULL, (JSON_UNQUOTE(JSON_EXTRACT(stats, '$[1]')) * 7.43e-14 + JSON_UNQUOTE(JSON_EXTRACT(stats, '$[2]')) * 6.79e-24 * duration + JSON_UNQUOTE(JSON_EXTRACT(stats, '$[3]')) * 1e-06 + JSON_UNQUOTE(JSON_EXTRACT(stats, '$[4]')) * 1e-06 + IF(JSON_UNQUOTE(JSON_EXTRACT(stats, '$[0]')) = 1, 0, IF(JSON_UNQUOTE(JSON_EXTRACT(stats, '$[0]')) = 2, JSON_UNQUOTE(JSON_EXTRACT(stats, '$[5]')) * 8.94e-10,IF( IF(JSON_UNQUOTE(JSON_EXTRACT(stats, '$[6]'))='',0.0,JSON_UNQUOTE(JSON_EXTRACT(stats, '$[6]'))) = 1, JSON_UNQUOTE(JSON_EXTRACT(stats, '$[5]')) * 0, JSON_UNQUOTE(JSON_EXTRACT(stats, '$[5]')) * 8.94e-10 )))) / 1.0026988039e-06, cu)) AS `cu` from system.statement_info left join mo_catalog.statement_cu ON system.statement_info.statement_id = mo_catalog.statement_cu.statement_id where 1=1 AND `request_at` >= '2023-11-20 05:45:28' AND system.statement_info.account = 'c0e80d42_5ac2_42d6_8aeb_11700a310bdf' AND sql_source_type IN ('cloud_user_sql','external_sql') )t ORDER BY request_at DESC LIMIT 20
https://github.com/matrixorigin/matrixone/pull/12884 will fix this issue
https://github.com/matrixorigin/matrixone/pull/12884 has been merged
The same error occurred in commit nightly-842654e4
when query statement_info
这个在 MO Cloud 上会经常看到,我相信可以通过云平台复现
dup with #13288
这个问题也是由于cn的oom导致的,在执行查询的过程中,pipeline发送给另一个cn,这个发生oom,就会返回错误:
cn oom 问题,暂无进展
oom 问题,没有进展
related to https://github.com/matrixorigin/matrixone/issues/12263
@nnsgmsone please help on OOM issue, thank you
wait https://github.com/matrixorigin/matrixone/issues/12532
正在和存储的同事协商https://github.com/matrixorigin/matrixone/issues/12532
内存问题等待#12532
内存问题等待https://github.com/matrixorigin/matrixone/issues/12532
内存问题等待https://github.com/matrixorigin/matrixone/issues/12532
内存问题等待https://github.com/matrixorigin/matrixone/issues/12532
no process
no process
处理数据正确性问题中
no process
等待pr和入
处理事务泄露中
no process
no process
定位中
stream close的问题旭哥已经修复,不过为啥会阻塞这么久还没查明。