doris icon indicating copy to clipboard operation
doris copied to clipboard

[improvement](scan) Support pushdown execute expr ctx

Open xinyiZzz opened this issue 2 years ago • 15 comments

Proposed changes

Issue Number: close #xxx

Problem summary

motivation

In the past, only simple predicates (slot=const), and, like, or (only bitmap index) could be pushed down to the storage layer. scan process:

  1. Read part of the column first, and calculate the row ids with a simple push-down predicate.
  2. Use row ids to read the remaining columns and pass them to the scanner, and the scanner filters the remaining predicates.

This pr will also push-down the remaining predicates (functions, nested predicates...) in the scanner to the storage layer for filtering. scan process:

  1. Read part of the column first, and use the push-down simple predicate to calculate the row ids, (same as above)
  2. Use row ids to read the columns needed for the remaining predicates, and use the pushed-down remaining predicates to reduce the number of row ids again.
  3. Use row ids to read the remaining columns and pass them to the scanner.

expected

  1. select a,b,c,... from tbl where func(a) = 1. The performance will be significantly improved, and the effect depends on the number of selected columns and the filtering rate.

example: 10 times better performance

SELECT * FROM hits WHERE UPPER(URL)="HTTP://SAMARA.IRR.RU/CATALOG_GOOGLETBR%26AD%3D278885%26BT%3D430001216";
  1. select a from tbl where func(a) = 1. Performance is expected to remain the same, as it behaves the same as before. (In the test, there is a 2% loss in performance. I found that VExprContext::filter_block performs differently in different locations, which may be related to cache miss, but the tool did not detect it)

  2. clickbench test

test cost(s)
default 134
set enable_remaining_expr_pushdown = true; 110.25

default:

query1: 0.03,0.03,0.03
query2: 0.11,0.04,0.04
query3: 0.09,0.07,0.07
query4: 0.22,0.08,0.07
query5: 0.66,0.62,0.54
query6: 0.97,0.90,0.93
query7: 0.02,0.01,0.01
query8: 0.05,0.05,0.04
query9: 1.55,1.38,1.41
query10: 1.98,1.93,1.94
query11: 0.36,0.33,0.35
query12: 0.37,0.34,0.38
query13: 0.61,0.63,0.59
query14: 1.46,1.46,1.44
query15: 0.84,0.85,0.88
query16: 0.51,0.48,0.47
query17: 1.78,1.96,1.82
query18: 0.45,0.43,0.42
query19: 4.25,4.25,4.19
query20: 0.01,0.02,0.01
query21: 1.20,0.31,0.31
query22: 0.12,0.11,0.11
query23: 1.13,0.29,0.28
query24: 8.33,9.99,9.83
query25: 0.13,0.10,0.11
query26: 0.10,0.10,0.11
query27: 0.11,0.09,0.11
query28: 0.48,0.37,0.38
query29: 4.74,4.63,4.64
query30: 1.95,1.98,1.91
query31: 0.43,0.38,0.41
query32: 0.61,0.59,0.58
query33: 4.04,4.03,3.94
query34: 4.52,3.86,3.94
query35: 4.07,3.63,3.79
query36: 1.41,1.43,1.37
query37: 0.07,0.06,0.05
query38: 0.03,0.03,0.03
query39: 0.02,0.02,0.03
query40: 0.15,0.13,0.14
query41: 0.04,0.02,0.03
query42: 0.03,0.02,0.02
query43: 0.04,0.04,0.03

set enable_remaining_expr_pushdown = true;

query1: 0.02,0.02,0.01
query2: 0.03,0.04,0.03
query3: 0.06,0.07,0.07
query4: 0.07,0.07,0.07
query5: 0.67,0.60,0.56
query6: 0.87,0.84,0.84
query7: 0.01,0.01,0.01
query8: 0.05,0.03,0.04
query9: 1.48,1.36,1.41
query10: 2.02,1.88,1.90
query11: 0.32,0.44,0.35
query12: 0.35,0.34,0.42
query13: 0.59,0.70,0.64
query14: 1.46,1.48,1.44
query15: 0.81,0.92,0.82
query16: 0.49,0.46,0.46
query17: 1.98,1.98,1.92
query18: 0.43,0.42,0.44
query19: 4.31,4.29,4.38
query20: 0.01,0.01,0.01
query21: 0.30,0.31,0.32
query22: 0.11,0.12,0.11
query23: 0.26,0.27,0.26
query24: 0.44,0.44,0.43
query25: 0.11,0.10,0.09
query26: 0.11,0.11,0.11
query27: 0.10,0.10,0.11
query28: 0.37,0.38,0.39
query29: 4.71,4.59,4.62
query30: 2.04,2.04,1.95
query31: 0.43,0.39,0.38
query32: 0.61,0.63,0.58
query33: 3.98,4.08,3.93
query34: 4.21,3.97,4.08
query35: 3.89,3.98,3.56
query36: 1.36,1.32,1.35
query37: 0.06,0.05,0.06
query38: 0.03,0.02,0.03
query39: 0.02,0.02,0.02
query40: 0.15,0.14,0.14
query41: 0.02,0.02,0.02
query42: 0.02,0.02,0.02
query43: 0.04,0.03,0.02

Checklist(Required)

  1. Does it affect the original behavior:
    • [ ] Yes
    • [ ] No
    • [ ] I don't know
  2. Has unit tests been added:
    • [ ] Yes
    • [ ] No
    • [ ] No Need
  3. Has document been added or modified:
    • [ ] Yes
    • [ ] No
    • [ ] No Need
  4. Does it need to update dependencies:
    • [ ] Yes
    • [ ] No
  5. Are there any changes that cannot be rolled back:
    • [ ] Yes (If Yes, please explain WHY)
    • [ ] No

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

xinyiZzz avatar Jan 13 '23 13:01 xinyiZzz

TeamCity pipeline, clickbench performance test result: the sum of best hot time: 34.17 seconds stream load tsv: 480 seconds loaded 74807831229 Bytes, about 148 MB/s stream load json: 41 seconds loaded 2358488459 Bytes, about 54 MB/s stream load orc: 74 seconds loaded 1101869774 Bytes, about 14 MB/s stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230309135950_clickbench_pr_111585.html

hello-stephen avatar Jan 16 '23 02:01 hello-stephen

run p0

xinyiZzz avatar Feb 20 '23 08:02 xinyiZzz

run buildall

xinyiZzz avatar Feb 21 '23 00:02 xinyiZzz

run arm

xinyiZzz avatar Feb 21 '23 00:02 xinyiZzz

run p0

xinyiZzz avatar Feb 21 '23 00:02 xinyiZzz

run buildall

xinyiZzz avatar Feb 21 '23 01:02 xinyiZzz

run buildall

xinyiZzz avatar Feb 21 '23 03:02 xinyiZzz

run clickbench

xinyiZzz avatar Feb 21 '23 07:02 xinyiZzz

run clickbench

xinyiZzz avatar Feb 21 '23 13:02 xinyiZzz

run buildall

xinyiZzz avatar Feb 26 '23 15:02 xinyiZzz

run buildall

xinyiZzz avatar Feb 27 '23 00:02 xinyiZzz

run buildall

xinyiZzz avatar Feb 27 '23 04:02 xinyiZzz

run buildall

xinyiZzz avatar Mar 02 '23 09:03 xinyiZzz

run buildall

xinyiZzz avatar Mar 03 '23 09:03 xinyiZzz

run beut macos

k-i-d-d avatar Mar 03 '23 12:03 k-i-d-d

run buildall

xinyiZzz avatar Mar 05 '23 02:03 xinyiZzz

run buildall

xinyiZzz avatar Mar 06 '23 20:03 xinyiZzz

PR approved by at least one committer and no changes requested.

github-actions[bot] avatar Mar 07 '23 00:03 github-actions[bot]

PR approved by anyone and no changes requested.

github-actions[bot] avatar Mar 07 '23 00:03 github-actions[bot]

./run buildall

yiguolei avatar Mar 07 '23 23:03 yiguolei

./run buildall

xinyiZzz avatar Mar 08 '23 11:03 xinyiZzz

./run buildall

xinyiZzz avatar Mar 08 '23 19:03 xinyiZzz

run buildall

xinyiZzz avatar Mar 08 '23 19:03 xinyiZzz

run buildall

xinyiZzz avatar Mar 09 '23 01:03 xinyiZzz

run buildall

xinyiZzz avatar Mar 09 '23 02:03 xinyiZzz

run buildall

xinyiZzz avatar Mar 09 '23 10:03 xinyiZzz