[Bug]: select return result is incorrect which where filter condition is unique index
Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
Branch Name
main
Commit ID
07b514cd4dd45ae30ccdaeed026cd4520331a1f9
Other Environment Information
- Hardware parameters:
- OS type:
- Others:
Actual Behavior
o_custkey为唯一索引,where后过滤值为unique key返回值不对
orders ddl:
CREATE TABLE orders (
O_ORDERKEY bigint NOT NULL,
O_CUSTKEY int NOT NULL,
O_ORDERSTATUS char(1) NOT NULL,
O_TOTALPRICE decimal(15,2) NOT NULL,
O_ORDERDATE date NOT NULL,
O_ORDERPRIORITY char(15) NOT NULL,
O_CLERK char(15) NOT NULL,
O_SHIPPRIORITY int NOT NULL,
O_COMMENT varchar(79) NOT NULL,
PRIMARY KEY (O_ORDERKEY),
UNIQUE KEY o_custkey (O_CUSTKEY)
)
Expected Behavior
No response
Steps to Reproduce
CREATE TABLE `orders` (
`O_ORDERKEY` bigint NOT NULL,
`O_CUSTKEY` int NOT NULL,
`O_ORDERSTATUS` char(1) NOT NULL,
`O_TOTALPRICE` decimal(15,2) NOT NULL,
`O_ORDERDATE` date NOT NULL,
`O_ORDERPRIORITY` char(15) NOT NULL,
`O_CLERK` char(15) NOT NULL,
`O_SHIPPRIORITY` int NOT NULL,
`O_COMMENT` varchar(79) NOT NULL,
PRIMARY KEY (`O_ORDERKEY`),
UNIQUE KEY `o_custkey` (`O_CUSTKEY`)
)
load data url s3option {'endpoint'='http://minio.minio-mo.svc.cluster.local','access_key_id'='xxx','secret_access_key'='xxx','bucket'='mo-load-data', 'filepath'='tpch_100/orders.tbl'} into table orders fields terminated by '|' lines terminated by '\n' parallel 'true';
select O_ORDERKEY,O_CUSTKEY from orders where O_CUSTKEY in (4);
select O_ORDERKEY,O_CUSTKEY from orders where O_ORDERKEY =544949286;
Additional information
No response
因为load时不去重,导致unique约束失效,数据出错。
load时候检测?@ouyuanning cc
no process
待高老师确认处理方案
load的时候不做唯一性检测,版本一直处理方式,2.0不解决,挪到后续版本解决
等产品确认方案下个版本解决
等产品确认方案下个版本解决
无进展
同上
无进展
无进展
未投入
未投入
无
提的pr会导致tke tech load超时严重,还需要定位
同上,暂时没投入
未投入
未投入
未投入
之前提的pr,load 走去重后,会有卡住和oom问题,卡住的问题找到必现的场景,可以避免,卡住的原因还没定位出来,目前看到是dispatch在多cn场景下卡住了,merge和dispatch互相在等,oom的问题抓了下面malloc的profile, 看起来是会同时最shufflebuild, 由于load的这个文件很大,这块流程代码不熟还要再看一下。
卡住的堆栈
oom的定位还需要定位
同上
同上
正在定位,在oom 内存占用比较多的地方,打日志,对于load2G多的文件, 下图这里分配的内存不到4G, top峰值有20多G,
这是之前在129跑抓的,这上面内存扩张更明显,今天尝试在129跑一次,抓一下日志, 但是oom导致129挂了,现在连不上
The later discussion has nothing to do with the original problem any more. Close. OOM should file new bug.