matrixone icon indicating copy to clipboard operation
matrixone copied to clipboard

[Bug]: Insert in transaction reports error `20101 (HY000): internal error: Invalid alloc size 1147486208`

Open DanielZhangQD opened this issue 1 year ago • 36 comments

Is there an existing issue for the same bug?

  • [X] I have checked the existing issues.

Environment

- Version or commit-id (e.g. v0.1.0 or 8b23a93): nightly-898ac286
- Hardware parameters:
- OS type:
- Others:

Actual Behavior

  • Open a transaction
  • In the transaction, insert 1000 records in one batch, and when it comes to the second batch, the error 20101 (HY000): internal error: Invalid alloc size 1147486208 is fired. The schema is as below:
MySQL [mo_c]> desc mo_catalog.statement_cx;
+--------------+------------------+------+------+---------+-------+--------------------------------+
| Field        | Type             | Null | Key  | Default | Extra | Comment                        |
+--------------+------------------+------+------+---------+-------+--------------------------------+
| statement_id | VARCHAR(36)      | NO   | PRI  | NULL    |       |                                |
| account      | VARCHAR(300)     | NO   |      | NULL    |       |                                |
| response_at  | DATETIME(3)      | YES  |      | NULL    |       |                                |
| cx           | DECIMAL128(23)   | NO   |      | NULL    |       |                                |
| account_id   | INT UNSIGNED(32) | NO   | PRI  | NULL    |       | the account_id added by the mo |
+--------------+------------------+------+------+---------+-------+--------------------------------+
5 rows in set (0.30 sec)

Here is an example record:

('37cb68b3-61f4-11ee-ac82-eeab67a4d8c2','b6011b4d_75b2_4458_875c_42f6dd16d927','2023-10-03 13:53:23.105','0',9000001)

Expected Behavior

Insert records successfully

Steps to Reproduce

NA

Additional information

Log: http://47.97.80.230/explore?orgId=1&left=%7B%22datasource%22:%22P3C839627617F91E0%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22freetier-02%5C%22,%20app%3D%5C%22default-cn%5C%22%7D%20%7C%3D%20%60%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22P3C839627617F91E0%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D

DanielZhangQD avatar Oct 09 '23 02:10 DanielZhangQD

待明天复现

ouyuanning avatar Oct 09 '23 11:10 ouyuanning

使用以下脚本,未能复现: 12021.sql.zip

ouyuanning avatar Oct 10 '23 08:10 ouyuanning

Just check the existing log here: http://47.97.80.230/explore?orgId=1&left=%7B%22datasource%22:%22P3C839627617F91E0%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22P3C839627617F91E0%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22%7Bapp%3D%5C%22default-cn%5C%22%7D%20%7C%3D%20%60Invalid%20alloc%20size%60%22,%22queryType%22:%22range%22%7D%5D,%22range%22:%7B%22from%22:%221696644000000%22,%22to%22:%221696672799000%22%7D%7D

DanielZhangQD avatar Oct 10 '23 10:10 DanielZhangQD

也未复现,而且DAILY REGRESSION每天都有很多次Sysbench load data的操作,行为也是batch insert,每个batch的数据量是10000,每个record比这个数据量还大,也都未出现过这个问题

aressu1985 avatar Oct 10 '23 12:10 aressu1985

从日志看不出是什么样的statement引起的。但是日志里报错都是 Invalid alloc size 1147486208

应该是相同的sql,相同的batch大小引起的。

ouyuanning avatar Oct 11 '23 01:10 ouyuanning

今天请假。明天加些log再看看吧

ouyuanning avatar Oct 12 '23 12:10 ouyuanning

升级到 nightly-f5405008,尝试 insert 几十万条,没有出现之前的在 2000 条就报 invalid alloc size 的问题,但是最终 insert 会失败,CN 会挂掉:

    lastState:
      terminated:
        containerID: containerd://a516cee87e3d2c58e0ddb03d63db09f56fb5950659c4f208e715c750abe89713
        exitCode: 2
        finishedAt: "2023-10-13T04:01:02Z"
        reason: Error
        startedAt: "2023-10-13T04:00:31Z"

原因应该是:

  • Transaction 内数据量太大
  • update on duplicate 占用内存太多

所以这个 bug,

  • MO 侧,暂时降低优先级,和 https://github.com/matrixorigin/matrixone/issues/11793 一起改
  • 应用侧,先改成不在一个 Transaction 内 insert 这么多数据,另外使用 on duplicate ignore

DanielZhangQD avatar Oct 13 '23 05:10 DanielZhangQD

目前这个报错一般由2个原因引起 1、数据量大的时候。build hash时暴了。类似 #https://github.com/matrixorigin/MO-Cloud/issues/1414 2、算子对batch没有做切分,类似 https://github.com/matrixorigin/matrixone/issues/11793

1的部分。下周要跟龙冉确认下,是否转到spill的issue上

ouyuanning avatar Oct 15 '23 09:10 ouyuanning

应用改成没有使用 Transaction,每次插入 1000 条后,跑了一段时间,开始没有报错,后来又报这个错误了

DanielZhangQD avatar Oct 16 '23 02:10 DanielZhangQD

修改已经合入。。。

nnsgmsone avatar Oct 16 '23 06:10 nnsgmsone

应用暂时修改不使用 on duplicate key update 语句规避了这个问题

DanielZhangQD avatar Oct 17 '23 03:10 DanielZhangQD

on duplicate key 前的left join引起。 但是修改为right join的话会有正确性问题

ouyuanning avatar Oct 18 '23 07:10 ouyuanning

如果 这个issue https://github.com/matrixorigin/matrixone/issues/12176 好了的话。 也可以使用这个方案先做替代:

drop database if exists db1;
create database db1;
use db1;
create table t1(a int primary key, b int);
insert into t1 values (1,1),(3,3);
insert into t1 select b.* from t1 right join (values row(1,1),row(2,2)) b(a,b) on t1.a=b.a where t1.a is null;   //这个语句。
//其实只插入pk检查不存在的数据。语义上跟 insert ignore是一样的

ouyuanning avatar Oct 18 '23 08:10 ouyuanning

#12176 是误报 可以用那个方案试一试

ouyuanning avatar Oct 18 '23 08:10 ouyuanning

not working on it

ouyuanning avatar Oct 19 '23 11:10 ouyuanning

not working on it

ouyuanning avatar Oct 23 '23 13:10 ouyuanning

not working on it

ouyuanning avatar Oct 26 '23 12:10 ouyuanning

not working on it

ouyuanning avatar Oct 30 '23 13:10 ouyuanning

not working on it

ouyuanning avatar Nov 02 '23 10:11 ouyuanning

not working on it

ouyuanning avatar Nov 06 '23 12:11 ouyuanning

on duplicate key是阻塞类算子。得等spill好了之后,持久化才行。

ouyuanning avatar Nov 09 '23 12:11 ouyuanning

not working on it

ouyuanning avatar Nov 13 '23 12:11 ouyuanning

not working on it

ouyuanning avatar Nov 16 '23 12:11 ouyuanning

等spills好了再处理

ouyuanning avatar Nov 21 '23 11:11 ouyuanning

等spills好了才能处理

ouyuanning avatar Nov 24 '23 11:11 ouyuanning

not working on it

ouyuanning avatar Nov 28 '23 13:11 ouyuanning

not working on it

ouyuanning avatar Dec 08 '23 11:12 ouyuanning

not working on it

ouyuanning avatar Dec 12 '23 12:12 ouyuanning

not working on it

ouyuanning avatar Dec 15 '23 12:12 ouyuanning

not working on it

ouyuanning avatar Dec 19 '23 11:12 ouyuanning