Proposed changes

Issue Number: close #xxx

Problem summary

motivation

In the past, only simple predicates (slot=const), and, like, or (only bitmap index) could be pushed down to the storage layer. scan process:

Read part of the column first, and calculate the row ids with a simple push-down predicate.
Use row ids to read the remaining columns and pass them to the scanner, and the scanner filters the remaining predicates.

This pr will also push-down the remaining predicates (functions, nested predicates...) in the scanner to the storage layer for filtering. scan process:

Read part of the column first, and use the push-down simple predicate to calculate the row ids, (same as above)
Use row ids to read the columns needed for the remaining predicates, and use the pushed-down remaining predicates to reduce the number of row ids again.
Use row ids to read the remaining columns and pass them to the scanner.

expected

select a,b,c,... from tbl where func(a) = 1. The performance will be significantly improved, and the effect depends on the number of selected columns and the filtering rate.

example: 10 times better performance

SELECT * FROM hits WHERE UPPER(URL)="HTTP://SAMARA.IRR.RU/CATALOG_GOOGLETBR%26AD%3D278885%26BT%3D430001216";

select a from tbl where func(a) = 1. Performance is expected to remain the same, as it behaves the same as before. (In the test, there is a 2% loss in performance. I found that VExprContext::filter_block performs differently in different locations, which may be related to cache miss, but the tool did not detect it)
clickbench test

test	cost(s)
default	134
set enable_remaining_expr_pushdown = true;	110.25

default:

query1: 0.03,0.03,0.03
query2: 0.11,0.04,0.04
query3: 0.09,0.07,0.07
query4: 0.22,0.08,0.07
query5: 0.66,0.62,0.54
query6: 0.97,0.90,0.93
query7: 0.02,0.01,0.01
query8: 0.05,0.05,0.04
query9: 1.55,1.38,1.41
query10: 1.98,1.93,1.94
query11: 0.36,0.33,0.35
query12: 0.37,0.34,0.38
query13: 0.61,0.63,0.59
query14: 1.46,1.46,1.44
query15: 0.84,0.85,0.88
query16: 0.51,0.48,0.47
query17: 1.78,1.96,1.82
query18: 0.45,0.43,0.42
query19: 4.25,4.25,4.19
query20: 0.01,0.02,0.01
query21: 1.20,0.31,0.31
query22: 0.12,0.11,0.11
query23: 1.13,0.29,0.28
query24: 8.33,9.99,9.83
query25: 0.13,0.10,0.11
query26: 0.10,0.10,0.11
query27: 0.11,0.09,0.11
query28: 0.48,0.37,0.38
query29: 4.74,4.63,4.64
query30: 1.95,1.98,1.91
query31: 0.43,0.38,0.41
query32: 0.61,0.59,0.58
query33: 4.04,4.03,3.94
query34: 4.52,3.86,3.94
query35: 4.07,3.63,3.79
query36: 1.41,1.43,1.37
query37: 0.07,0.06,0.05
query38: 0.03,0.03,0.03
query39: 0.02,0.02,0.03
query40: 0.15,0.13,0.14
query41: 0.04,0.02,0.03
query42: 0.03,0.02,0.02
query43: 0.04,0.04,0.03

set enable_remaining_expr_pushdown = true;

query1: 0.02,0.02,0.01
query2: 0.03,0.04,0.03
query3: 0.06,0.07,0.07
query4: 0.07,0.07,0.07
query5: 0.67,0.60,0.56
query6: 0.87,0.84,0.84
query7: 0.01,0.01,0.01
query8: 0.05,0.03,0.04
query9: 1.48,1.36,1.41
query10: 2.02,1.88,1.90
query11: 0.32,0.44,0.35
query12: 0.35,0.34,0.42
query13: 0.59,0.70,0.64
query14: 1.46,1.48,1.44
query15: 0.81,0.92,0.82
query16: 0.49,0.46,0.46
query17: 1.98,1.98,1.92
query18: 0.43,0.42,0.44
query19: 4.31,4.29,4.38
query20: 0.01,0.01,0.01
query21: 0.30,0.31,0.32
query22: 0.11,0.12,0.11
query23: 0.26,0.27,0.26
query24: 0.44,0.44,0.43
query25: 0.11,0.10,0.09
query26: 0.11,0.11,0.11
query27: 0.10,0.10,0.11
query28: 0.37,0.38,0.39
query29: 4.71,4.59,4.62
query30: 2.04,2.04,1.95
query31: 0.43,0.39,0.38
query32: 0.61,0.63,0.58
query33: 3.98,4.08,3.93
query34: 4.21,3.97,4.08
query35: 3.89,3.98,3.56
query36: 1.36,1.32,1.35
query37: 0.06,0.05,0.06
query38: 0.03,0.02,0.03
query39: 0.02,0.02,0.02
query40: 0.15,0.14,0.14
query41: 0.02,0.02,0.02
query42: 0.02,0.02,0.02
query43: 0.04,0.03,0.02

Checklist(Required)

Does it affect the original behavior:
- [ ] Yes
- [ ] No
- [ ] I don't know
Has unit tests been added:
- [ ] Yes
- [ ] No
- [ ] No Need
Has document been added or modified:
- [ ] Yes
- [ ] No
- [ ] No Need
Does it need to update dependencies:
- [ ] Yes
- [ ] No
Are there any changes that cannot be rolled back:
- [ ] Yes (If Yes, please explain WHY)
- [ ] No

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

Jan 13 '23 13:01 xinyiZzz

TeamCity pipeline, clickbench performance test result: the sum of best hot time: 34.17 seconds stream load tsv: 480 seconds loaded 74807831229 Bytes, about 148 MB/s stream load json: 41 seconds loaded 2358488459 Bytes, about 54 MB/s stream load orc: 74 seconds loaded 1101869774 Bytes, about 14 MB/s stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230309135950_clickbench_pr_111585.html

Jan 16 '23 02:01 hello-stephen

run p0

Feb 20 '23 08:02 xinyiZzz

run buildall

Feb 21 '23 00:02 xinyiZzz

run arm

Feb 21 '23 00:02 xinyiZzz

run p0

Feb 21 '23 00:02 xinyiZzz

run buildall

Feb 21 '23 01:02 xinyiZzz

run buildall

Feb 21 '23 03:02 xinyiZzz

run clickbench

Feb 21 '23 07:02 xinyiZzz

run clickbench

Feb 21 '23 13:02 xinyiZzz

run buildall

Feb 26 '23 15:02 xinyiZzz

run buildall

Feb 27 '23 00:02 xinyiZzz

run buildall

Feb 27 '23 04:02 xinyiZzz

run buildall

Mar 02 '23 09:03 xinyiZzz

run buildall

Mar 03 '23 09:03 xinyiZzz

run beut macos

Mar 03 '23 12:03 k-i-d-d

run buildall

Mar 05 '23 02:03 xinyiZzz

run buildall

Mar 06 '23 20:03 xinyiZzz

PR approved by at least one committer and no changes requested.

Mar 07 '23 00:03 github-actions[bot]

PR approved by anyone and no changes requested.

Mar 07 '23 00:03 github-actions[bot]

./run buildall

Mar 07 '23 23:03 yiguolei

./run buildall

Mar 08 '23 11:03 xinyiZzz

./run buildall

Mar 08 '23 19:03 xinyiZzz

run buildall

Mar 08 '23 19:03 xinyiZzz

run buildall

Mar 09 '23 01:03 xinyiZzz

run buildall

Mar 09 '23 02:03 xinyiZzz

run buildall

Mar 09 '23 10:03 xinyiZzz

doris
doris copied to clipboard

[improvement](scan) Support pushdown execute expr ctx

Proposed changes

Problem summary

motivation

expected

Checklist(Required)

Further comments

doris doris copied to clipboard

[improvement](scan) Support pushdown execute expr ctx

Proposed changes

Problem summary

motivation

expected

Checklist(Required)

Further comments

doris
doris copied to clipboard