matrixone icon indicating copy to clipboard operation
matrixone copied to clipboard

[Bug]: use load_file() function to load stage data, mo report 'ERROR 20101 (HY000): internal error: Invalid alloc size 7775731712'.

Open Ariznawlll opened this issue 1 year ago • 1 comments

Is there an existing issue for the same bug?

  • [X] I have checked the existing issues.

Branch Name

main

Commit ID

c678c5a6908fd9132443478123acc74ddc1653cb

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

数据量:7.2G image

image

log:https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22GYP%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-717cb94-202409191131%5C%22%7D%20%7C%3D%20%60Invalid%20alloc%20size%207775731712%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221726738859000%22,%22to%22:%221726739579000%22%7D%7D%7D&schemaVersion=1&orgId=1

Expected Behavior

load_file success

Steps to Reproduce

1. 创建基于minio的stage:
create stage stage01 url = 's3://stagetest/tpch_10' credentials = {"aws_key_id"='***',"aws_secret_key"='***',"AWS_REGION"='us-east-2','PROVIDER'='minio', 'ENDPOINT'='http://10.222.6.2'};

2. 启动mo分布式环境(128服务器上)

3. 通过load_file()函数加载数据 
select load_file(cast('stage://stage01/lineitem.tbl' as datalink));  -->出错的sql

Additional information

No response

Ariznawlll avatar Sep 19 '24 09:09 Ariznawlll

创建基于file system的stage也会出现这个问题:

在128服务器上: create stage stage01 url='file:///data/mo-load-data/data/tpch_10'; create database tpch_10g; CREATE TABLE LINEITEM( L_ORDERKEY BIGINT NOT NULL, L_PARTKEY INTEGER NOT NULL, L_SUPPKEY INTEGER NOT NULL, L_LINENUMBER INTEGER NOT NULL, L_QUANTITY DECIMAL(15,2) NOT NULL, L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL, L_DISCOUNT DECIMAL(15,2) NOT NULL, L_TAX DECIMAL(15,2) NOT NULL, L_RETURNFLAG VARCHAR(1) NOT NULL, L_LINESTATUS VARCHAR(1) NOT NULL, L_SHIPDATE DATE NOT NULL, L_COMMITDATE DATE NOT NULL, L_RECEIPTDATE DATE NOT NULL, L_SHIPINSTRUCT CHAR(25) NOT NULL, L_SHIPMODE CHAR(10) NOT NULL, L_COMMENT VARCHAR(44) NOT NULL, PRIMARY KEY (L_ORDERKEY, L_LINENUMBER) );

select load_file(cast('stage://stage01/lineitem.tbl' as datalink)); -->报错ERROR 20101 (HY000): internal error: Invalid alloc size 7775731712

Ariznawlll avatar Sep 19 '24 13:09 Ariznawlll

main commit: 2f86fead2 第一次执行报错OOM image

第二次执行报错'internal error: Invalid alloc size 7775731712' image

还没合入main,main需要等合进去测试

Ariznawlll avatar Nov 13 '24 07:11 Ariznawlll

对于这个问题,建议首先在文档中建议客户不要使用load_file 一次性的导入过大的文件,例如:5G。

另外,对于这个issue 建议在2.1.0 版本修复。

allengaoo avatar Nov 14 '24 03:11 allengaoo

波洋看是否可以分批导入

ouyuanning avatar Mar 03 '25 06:03 ouyuanning

file data is actually loaded into the memory but memory pool inside the vector returns error.

Image

cpegeric avatar May 12 '25 15:05 cpegeric

The highlighted code raised error. Seems nothing we can do if limit of memory pool is 2G.

Image

cpegeric avatar May 12 '25 15:05 cpegeric

I think the right thing to read whole file content is to read by multiple chunks via datalink(offset, size) just like HNSW did.

select 0, cast('stage_url?offset=0&size=64000' as datalink); select 1, cast('stage_url?offset=64000&size=64000' as datalink);

just like multi-part in S3.

cpegeric avatar May 12 '25 16:05 cpegeric