doris
doris copied to clipboard
[nereids] string literal coercion of in predicate
Description:
The sql execute much slow when the literal value with string format in in predicate
; and the real data is integral type。
mysql> set enable_nereids_planner = false;
Query OK, 0 rows affected (0.03 sec)
mysql> select id,sum(clicks) from a_table where id in ('787934713', '306960695') group by id limit 10;
+------------+---------------+
| id | sum(`clicks`) |
+------------+---------------+
| 787934713 | 2838 |
| 306960695 | 339 |
+------------+---------------+
2 rows in set (1.81 sec)
mysql> set enable_nereids_planner = true;
Query OK, 0 rows affected (0.02 sec)
mysql> select id,sum(clicks) from a_table where id in ('787934713', '306960695') group by id limit 10;
+------------+-------------+
| id | sum(clicks) |
+------------+-------------+
| 787934713 | 2838 |
| 306960695 | 339 |
+------------+-------------+
2 rows in set (28.14 sec)
Reason: In legacy planner, the string literal with convert to integral value, but in the nereids planner do not do this convert and with do string matching in BE。
Solved:
do process string literal with numeric in in predicate
like in comparison predicate
;
test table:
create table a_table(
k1 BIGINT NOT NULL,
k2 VARCHAR(100) NOT NULL,
v1 INT SUM NULL DEFAULT "0"
) ENGINE=OLAP
AGGREGATE KEY(k1,k2)
distributed BY hash(k1) buckets 2
properties("replication_num" = "1");
insert into a_table values (10, 'name1', 10),(20, 'name2', 10);
explain plan select * from a_table where k1 in ('10', '20001');
before optimize:
+--------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------------------------------------------------------------+
| ========== PARSED PLAN (time: 1ms) ========== |
| UnboundResultSink[4] ( ) |
| +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] ) |
| +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') ) |
| +--LogicalCheckPolicy ( ) |
| +--UnboundRelation ( id=RelationId#0, nameParts=a_table ) |
| |
| ========== ANALYZED PLAN (time: 2ms) ========== |
| LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] ) |
| +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], excepts=[] ) |
| +--LogicalFilter[11] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') ) |
| +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET ) |
| |
| ========== REWRITTEN PLAN (time: 6ms) ========== |
| LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] ) |
| +--LogicalFilter[43] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') ) |
| +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) |
| |
| ========== OPTIMIZED PLAN (time: 6ms) ========== |
| PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] ) |
| +--PhysicalDistribute[87]@1 ( stats=0.33, distributionSpec=DistributionSpecGather ) |
| +--PhysicalFilter[84]@1 ( stats=0.33, predicates=cast(k1#0 as TEXT) IN ('10001', '20001') ) |
| +--PhysicalOlapScan[a_table]@0 ( stats=1 ) |
+--------------------------------------------------------------------------------------------------------------------------------------+
after optimize:
+--------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------------------------------------------------------------+
| ========== PARSED PLAN (time: 15ms) ========== |
| UnboundResultSink[4] ( ) |
| +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] ) |
| +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') ) |
| +--LogicalCheckPolicy ( ) |
| +--UnboundRelation ( id=RelationId#0, nameParts=a_table ) |
| |
| ========== ANALYZED PLAN (time: 11ms) ========== |
| LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] ) |
| +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], excepts=[] ) |
| +--LogicalFilter[11] ( predicates=k1#0 IN (10001, 20001) ) |
| +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET ) |
| |
| ========== REWRITTEN PLAN (time: 12ms) ========== |
| LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] ) |
| +--LogicalFilter[43] ( predicates=k1#0 IN (10001, 20001) ) |
| +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) |
| |
| ========== OPTIMIZED PLAN (time: 4ms) ========== |
| PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] ) |
| +--PhysicalDistribute[87]@1 ( stats=0, distributionSpec=DistributionSpecGather ) |
| +--PhysicalFilter[84]@1 ( stats=0, predicates=k1#0 IN (10001, 20001) ) |
| +--PhysicalOlapScan[a_table]@0 ( stats=2 ) |
+--------------------------------------------------------------------------------------------------------------------------------------+
Thank you for your contribution to Apache Doris. Don't know what should be done next? See How to process your PR
Since 2024-03-18, the Document has been moved to doris-website. See Doris Document.
run buildall
TPC-H: Total hot run time: 41943 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 570607367f6ddd1b81c6cf9f5d267f8e299608f0, data reload: false
------ Round 1 ----------------------------------
q1 17604 4351 4234 4234
q2 2014 186 192 186
q3 10543 1200 1256 1200
q4 10201 791 758 758
q5 7493 2755 2669 2669
q6 219 139 139 139
q7 973 622 637 622
q8 9219 2162 2110 2110
q9 9449 6714 6747 6714
q10 9079 3878 3872 3872
q11 448 242 236 236
q12 527 237 227 227
q13 17442 3208 3207 3207
q14 261 207 216 207
q15 524 460 464 460
q16 508 395 392 392
q17 1001 666 785 666
q18 8336 7927 7959 7927
q19 7234 1591 1575 1575
q20 637 319 313 313
q21 5237 4004 3953 3953
q22 364 294 276 276
Total cold run time: 119313 ms
Total hot run time: 41943 ms
----- Round 2, with runtime_filter_mode=off -----
q1 4537 4381 4467 4381
q2 384 274 259 259
q3 3170 2962 2728 2728
q4 1859 1642 1656 1642
q5 5478 5479 5496 5479
q6 219 127 127 127
q7 2145 1825 1834 1825
q8 3231 3386 3388 3386
q9 8685 8688 8636 8636
q10 3942 3873 3857 3857
q11 573 492 491 491
q12 793 638 663 638
q13 16222 3145 3188 3145
q14 285 272 250 250
q15 523 464 474 464
q16 475 412 417 412
q17 1759 1509 1514 1509
q18 7645 7750 7560 7560
q19 1674 1528 1518 1518
q20 1995 1800 1795 1795
q21 5032 4698 4668 4668
q22 551 490 502 490
Total cold run time: 71177 ms
Total hot run time: 55260 ms
TPC-DS: Total hot run time: 168073 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 570607367f6ddd1b81c6cf9f5d267f8e299608f0, data reload: false
query1 899 378 371 371
query2 6432 2502 2409 2409
query3 6647 207 211 207
query4 19687 17258 17324 17258
query5 4114 410 428 410
query6 239 159 147 147
query7 4579 296 295 295
query8 240 194 180 180
query9 8609 2364 2349 2349
query10 455 281 264 264
query11 10758 10069 10127 10069
query12 129 94 86 86
query13 1640 352 362 352
query14 10110 5941 7523 5941
query15 207 167 171 167
query16 7768 263 268 263
query17 1713 516 516 516
query18 1947 269 265 265
query19 201 152 154 152
query20 88 86 99 86
query21 197 137 127 127
query22 4113 4169 3931 3931
query23 33668 33099 33144 33099
query24 9476 2895 2754 2754
query25 565 354 361 354
query26 725 163 159 159
query27 2205 329 332 329
query28 5609 2070 2054 2054
query29 849 599 600 599
query30 251 171 176 171
query31 967 758 751 751
query32 84 55 52 52
query33 633 254 264 254
query34 849 474 479 474
query35 711 624 618 618
query36 1053 938 891 891
query37 108 72 72 72
query38 2953 2741 2771 2741
query39 847 787 797 787
query40 206 127 128 127
query41 50 46 47 46
query42 102 96 105 96
query43 594 553 561 553
query44 1058 727 730 727
query45 183 170 167 167
query46 1072 746 729 729
query47 1840 1784 1771 1771
query48 364 291 298 291
query49 830 369 390 369
query50 765 382 394 382
query51 6829 6821 6705 6705
query52 101 91 91 91
query53 349 291 276 276
query54 726 414 419 414
query55 74 71 76 71
query56 257 237 235 235
query57 1106 1064 1055 1055
query58 231 207 206 206
query59 3545 3121 3183 3121
query60 284 253 252 252
query61 87 86 85 85
query62 591 439 438 438
query63 306 289 285 285
query64 8447 2227 1769 1769
query65 3165 3077 3167 3077
query66 784 321 325 321
query67 15200 14679 14775 14679
query68 4525 539 525 525
query69 439 262 269 262
query70 1141 1149 1167 1149
query71 359 268 270 268
query72 7259 2678 2531 2531
query73 705 321 321 321
query74 6039 5584 5626 5584
query75 3256 2634 2633 2633
query76 2270 994 956 956
query77 372 263 261 261
query78 10452 9803 9641 9641
query79 2403 523 516 516
query80 1133 456 455 455
query81 542 243 247 243
query82 952 97 100 97
query83 257 186 179 179
query84 247 90 87 87
query85 1326 275 259 259
query86 461 330 331 330
query87 3449 3113 3150 3113
query88 4006 2339 2326 2326
query89 470 389 388 388
query90 1960 190 189 189
query91 123 100 95 95
query92 60 47 46 46
query93 1511 513 503 503
query94 1274 186 186 186
query95 405 314 302 302
query96 576 267 266 266
query97 3118 2983 3010 2983
query98 239 225 214 214
query99 1159 862 823 823
Total cold run time: 263959 ms
Total hot run time: 168073 ms
ClickBench: Total hot run time: 30.51 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 570607367f6ddd1b81c6cf9f5d267f8e299608f0, data reload: false
query1 0.04 0.03 0.04
query2 0.08 0.03 0.04
query3 0.22 0.05 0.05
query4 1.68 0.08 0.09
query5 0.55 0.48 0.54
query6 1.13 0.72 0.72
query7 0.02 0.01 0.01
query8 0.05 0.03 0.04
query9 0.54 0.49 0.48
query10 0.53 0.55 0.55
query11 0.15 0.11 0.12
query12 0.15 0.12 0.13
query13 0.60 0.59 0.59
query14 0.74 0.81 0.78
query15 0.82 0.82 0.80
query16 0.37 0.37 0.37
query17 0.95 1.02 1.03
query18 0.21 0.22 0.26
query19 1.76 1.68 1.64
query20 0.01 0.01 0.01
query21 15.43 0.72 0.70
query22 4.64 6.90 1.95
query23 18.28 1.34 1.33
query24 1.44 0.34 0.20
query25 0.16 0.08 0.08
query26 0.25 0.16 0.16
query27 0.07 0.08 0.08
query28 13.45 1.04 0.99
query29 12.73 3.31 3.21
query30 0.24 0.06 0.06
query31 2.88 0.40 0.38
query32 3.29 0.47 0.45
query33 2.91 2.92 2.86
query34 17.14 4.41 4.44
query35 4.50 4.51 4.48
query36 0.67 0.49 0.46
query37 0.18 0.16 0.15
query38 0.15 0.15 0.14
query39 0.04 0.04 0.04
query40 0.16 0.14 0.15
query41 0.09 0.05 0.04
query42 0.05 0.05 0.05
query43 0.04 0.03 0.03
Total cold run time: 109.39 s
Total hot run time: 30.51 s
PR approved by at least one committer and no changes requested.
PR approved by anyone and no changes requested.
run p0