paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[core] Support convert TopN to limit for primary-key table in deletion-vector mode

Open Tan-JiaLiang opened this issue 4 months ago • 7 comments

Purpose

The primary-keys is NOT NULL, unique and natural sort by ascending, when deletion-vector enabled, the following rules can let TopN convert to limit.

  1. The preceding sort keys must matches with the primary keys in order, and the sort direction must be the same.
  2. If non-primary key including, all the primary-key must matches in order first.

e.g. partition keys: pt. trimmed primary keys: c1, c2. non-primary keys: n1

create table T (
  c1 int not null,
  c2 int not null,
  n1 int, 
  pt int not null
  PRIMARY KEY(pt, c1, c2)
) WITH (
  'deletion-vector-enabled' = 'true'
);

TopN without partition keys:

position c1 c2 n1
0 10 100 2
1 10 200 1
2 20 100 2
3 20 200 1
-- return position [0]
ORDER BY c1 LIMIT 1
ORDER BY c1, c2 LIMIT 1

-- return position [3]
ORDER BY c1 DESC LIMIT 1
ORDER BY c1 DESC, c2 DESC LIMIT 1

-- non-primary key `n1` can be ignored, if all the primary keys matches (because primary keys is unique)
-- return position [0]
ORDER BY c1, c2, n1 ASC LIMIT 1
ORDER BY c1, c2, n1 DESC LIMIT 1

-- return position [3]
ORDER BY c1 DESC, c2 DESC, n1 ASC LIMIT 1
ORDER BY c1 DESC, c2 DESC, n1 DESC LIMIT 1

-- this following example will not convert TopN to limit.
-- not match, sort direction not same as the primary keys's
ORDER BY c1, c2 DESC LIMIT 1
ORDER BY c1 DESC, c2 LIMIT 1

-- not match, if non-primary key including, all the trimmed primary keys must matches in order
ORDER BY c1, n1 LIMIT 1

TopN with partition keys:

-- not match, not supported at this time.
ORDER BY pt, c1, c2 LIMIT 1

-- match, only trimmed primary key.
WHERE pt=1 ORDER BY c1, c2 LIMIT 1

Tests

API and Format

Documentation

Tan-JiaLiang avatar Sep 03 '25 12:09 Tan-JiaLiang

I remember that when DV is turned on, the returns are out of order, not sorted by primary key.

JingsongLi avatar Sep 08 '25 06:09 JingsongLi

I remember that when DV is turned on, the returns are out of order, not sorted by primary key.

Why is it out of order? IMO, it will sort by the primary keys in a single DataFile, and using the deletion-vector to mark its deleted row id.

The mainly idea on this PR is to convert the TopN primary keys predicate into limit predicate when reading a single DataFile, then the compute engine (e.g. Apache Spark) will do the Global TopN.

Tan-JiaLiang avatar Sep 08 '25 07:09 Tan-JiaLiang

I remember that when DV is turned on, the returns are out of order, not sorted by primary key.

Why is it out of order? IMO, it will sort by the primary keys in a single DataFile, and using the deletion-vector to mark its deleted row id.

The mainly idea on this PR is to convert the TopN primary keys predicate into limit predicate when reading a single DataFile, then the compute engine (e.g. Apache Spark) will do the Global TopN.

I see, for single data file, it is true.

JingsongLi avatar Sep 09 '25 04:09 JingsongLi

Also, can you add benchmark for this?

JingsongLi avatar Sep 09 '25 04:09 JingsongLi

Running benchmark: limit
  Running case: pushdown-disable-1
  Stopped after 1 iterations, 327 ms
  Running case: pushdown-enable-1
  Stopped after 1 iterations, 17 ms

OpenJDK 64-Bit Server VM 1.8.0_442-b06 on Mac OS X 15.6.1
Apple M4 Pro
limit:                                                                                               Best/Avg Time(ms)    Row Rate(K/s)      Per Row(ns)   Relative
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
OPERATORTEST_limit_pushdown-disable-1                                                                      328 /  328           4573.5            218.6       1.0X
OPERATORTEST_limit_pushdown-enable-1                                                                        18 /   18          84730.0             11.8      18.5X




Running benchmark: TopN
  Running case: pushdown-disable-1
  Stopped after 1 iterations, 1224 ms
  Running case: pushdown-enable-1
  Stopped after 1 iterations, 71 ms

OpenJDK 64-Bit Server VM 1.8.0_442-b06 on Mac OS X 15.6.1
Apple M4 Pro
TopN:                                                                                                Best/Avg Time(ms)    Row Rate(K/s)      Per Row(ns)   Relative
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
OPERATORTEST_TopN_pushdown-disable-1                                                                      1224 / 1224           1225.2            816.2       1.0X
OPERATORTEST_TopN_pushdown-enable-1                                                                         72 /   72          20919.2             47.8      17.1X

Tan-JiaLiang avatar Sep 09 '25 06:09 Tan-JiaLiang

@JingsongLi Very thanks for your feedback! Sorry that I'm busy these days. I will update next week.

Tan-JiaLiang avatar Sep 10 '25 09:09 Tan-JiaLiang

@JingsongLi Very thanks for your feedback! Sorry that I'm busy these days. I will update next week.

Sure, please feel free to update according to your schedule.

JingsongLi avatar Sep 10 '25 09:09 JingsongLi