matrixone icon indicating copy to clipboard operation
matrixone copied to clipboard

[Bug]: select return result is incorrect which where filter condition is unique index

Open heni02 opened this issue 1 year ago • 3 comments

Is there an existing issue for the same bug?

  • [X] I have checked the existing issues.

Branch Name

main

Commit ID

07b514cd4dd45ae30ccdaeed026cd4520331a1f9

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

o_custkey为唯一索引,where后过滤值为unique key返回值不对 企业微信截图_8d7d7b35-bfef-44c0-8521-24d4fcece341

orders ddl: CREATE TABLE orders ( O_ORDERKEY bigint NOT NULL, O_CUSTKEY int NOT NULL, O_ORDERSTATUS char(1) NOT NULL, O_TOTALPRICE decimal(15,2) NOT NULL, O_ORDERDATE date NOT NULL, O_ORDERPRIORITY char(15) NOT NULL, O_CLERK char(15) NOT NULL, O_SHIPPRIORITY int NOT NULL, O_COMMENT varchar(79) NOT NULL, PRIMARY KEY (O_ORDERKEY), UNIQUE KEY o_custkey (O_CUSTKEY) )

Expected Behavior

No response

Steps to Reproduce

CREATE TABLE `orders` (
  `O_ORDERKEY` bigint NOT NULL,
  `O_CUSTKEY` int NOT NULL,
  `O_ORDERSTATUS` char(1) NOT NULL,
  `O_TOTALPRICE` decimal(15,2) NOT NULL,
  `O_ORDERDATE` date NOT NULL,
  `O_ORDERPRIORITY` char(15) NOT NULL,
  `O_CLERK` char(15) NOT NULL,
  `O_SHIPPRIORITY` int NOT NULL,
  `O_COMMENT` varchar(79) NOT NULL,
  PRIMARY KEY (`O_ORDERKEY`),
  UNIQUE KEY `o_custkey` (`O_CUSTKEY`)
)
load data url s3option {'endpoint'='http://minio.minio-mo.svc.cluster.local','access_key_id'='xxx','secret_access_key'='xxx','bucket'='mo-load-data', 'filepath'='tpch_100/orders.tbl'} into table orders fields terminated by '|' lines terminated by '\n' parallel 'true';
select O_ORDERKEY,O_CUSTKEY from orders where O_CUSTKEY in (4);
select O_ORDERKEY,O_CUSTKEY from orders where O_ORDERKEY =544949286;

Additional information

No response

heni02 avatar Sep 30 '24 04:09 heni02

因为load时不去重,导致unique约束失效,数据出错。

badboynt1 avatar Sep 30 '24 07:09 badboynt1

load时候检测?@ouyuanning cc 企业微信截图_7d90fe47-0774-4c7f-8015-7a7d77901257

heni02 avatar Oct 18 '24 08:10 heni02

no process

jensenojs avatar Oct 23 '24 10:10 jensenojs

待高老师确认处理方案

ouyuanning avatar Oct 24 '24 07:10 ouyuanning

load的时候不做唯一性检测,版本一直处理方式,2.0不解决,挪到后续版本解决

aressu1985 avatar Oct 24 '24 13:10 aressu1985

等产品确认方案下个版本解决

huby2358 avatar Oct 29 '24 10:10 huby2358

等产品确认方案下个版本解决

huby2358 avatar Nov 01 '24 14:11 huby2358

无进展

huby2358 avatar Nov 06 '24 10:11 huby2358

同上

huby2358 avatar Nov 11 '24 10:11 huby2358

无进展

huby2358 avatar Nov 14 '24 10:11 huby2358

无进展

huby2358 avatar Nov 19 '24 10:11 huby2358

未投入

huby2358 avatar Nov 22 '24 13:11 huby2358

未投入

huby2358 avatar Nov 28 '24 10:11 huby2358

huby2358 avatar Dec 03 '24 10:12 huby2358

提的pr会导致tke tech load超时严重,还需要定位

huby2358 avatar Dec 12 '24 10:12 huby2358

同上,暂时没投入

huby2358 avatar Dec 17 '24 10:12 huby2358

未投入

huby2358 avatar Dec 23 '24 03:12 huby2358

未投入

huby2358 avatar Dec 26 '24 10:12 huby2358

未投入

huby2358 avatar Dec 31 '24 10:12 huby2358

之前提的pr,load 走去重后,会有卡住和oom问题,卡住的问题找到必现的场景,可以避免,卡住的原因还没定位出来,目前看到是dispatch在多cn场景下卡住了,merge和dispatch互相在等,oom的问题抓了下面malloc的profile, 看起来是会同时最shufflebuild, 由于load的这个文件很大,这块流程代码不熟还要再看一下。 image

huby2358 avatar Jan 05 '25 10:01 huby2358

卡住的堆栈 企业微信截图_bb931c10-2978-4f0f-8caa-ef2b4f67f9fe 企业微信截图_2abcb13b-f17c-45e8-b886-c38efa58cdc8

huby2358 avatar Jan 05 '25 10:01 huby2358

oom的定位还需要定位

huby2358 avatar Jan 10 '25 10:01 huby2358

同上

huby2358 avatar Jan 16 '25 10:01 huby2358

同上

huby2358 avatar Jan 21 '25 10:01 huby2358

正在定位,在oom 内存占用比较多的地方,打日志,对于load2G多的文件, 下图这里分配的内存不到4G, top峰值有20多G,

Image Image 这里load的行数也是能对上的。 这里内存感觉有点对不上,这里是在深圳机器跑的。

huby2358 avatar Jan 24 '25 11:01 huby2358

Image 这是之前在129跑抓的,这上面内存扩张更明显,今天尝试在129跑一次,抓一下日志, 但是oom导致129挂了,现在连不上

huby2358 avatar Jan 24 '25 11:01 huby2358

The later discussion has nothing to do with the original problem any more. Close. OOM should file new bug.

fengttt avatar Apr 25 '25 07:04 fengttt