doris icon indicating copy to clipboard operation
doris copied to clipboard

[nereids] string literal coercion of in predicate

Open mongo360 opened this issue 9 months ago • 1 comments

Description: The sql execute much slow when the literal value with string format in in predicate; and the real data is integral type。

mysql> set enable_nereids_planner = false;
Query OK, 0 rows affected (0.03 sec)

mysql> select id,sum(clicks) from a_table where id in ('787934713', '306960695') group by id limit 10;
+------------+---------------+
| id | sum(`clicks`) |
+------------+---------------+
|  787934713 |          2838 |
|  306960695 |           339 |
+------------+---------------+
2 rows in set (1.81 sec)

mysql> set enable_nereids_planner = true;
Query OK, 0 rows affected (0.02 sec)

mysql> select id,sum(clicks) from a_table where id in ('787934713', '306960695') group by id limit 10;
+------------+-------------+
| id | sum(clicks) |
+------------+-------------+
|  787934713 |        2838 |
|  306960695 |         339 |
+------------+-------------+
2 rows in set (28.14 sec)

Reason: In legacy planner, the string literal with convert to integral value, but in the nereids planner do not do this convert and with do string matching in BE。

Solved: do process string literal with numeric in in predicate like in comparison predicate; test table:

create table a_table(
    k1 BIGINT NOT NULL,
    k2 VARCHAR(100) NOT NULL,
    v1 INT SUM NULL DEFAULT "0"
) ENGINE=OLAP
AGGREGATE KEY(k1,k2)
distributed BY hash(k1) buckets 2
properties("replication_num" = "1");
insert into a_table values (10, 'name1', 10),(20, 'name2', 10);
explain plan select * from a_table where k1 in ('10', '20001');

before optimize:

+--------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                                                                      |
+--------------------------------------------------------------------------------------------------------------------------------------+
| ========== PARSED PLAN (time: 1ms) ==========                                                                                        |
| UnboundResultSink[4] (  )                                                                                                            |
| +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] )                                                                    |
|    +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') )                                                                      |
|       +--LogicalCheckPolicy (  )                                                                                                     |
|          +--UnboundRelation ( id=RelationId#0, nameParts=a_table )                                                                   |
|                                                                                                                                      |
| ========== ANALYZED PLAN (time: 2ms) ==========                                                                                      |
| LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                             |
| +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], excepts=[] )                                                    |
|    +--LogicalFilter[11] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') )                                                      |
|       +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET )      |
|                                                                                                                                      |
| ========== REWRITTEN PLAN (time: 6ms) ==========                                                                                     |
| LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                             |
| +--LogicalFilter[43] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') )                                                         |
|    +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) |
|                                                                                                                                      |
| ========== OPTIMIZED PLAN (time: 6ms) ==========                                                                                     |
| PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                            |
| +--PhysicalDistribute[87]@1 ( stats=0.33, distributionSpec=DistributionSpecGather )                                                  |
|    +--PhysicalFilter[84]@1 ( stats=0.33, predicates=cast(k1#0 as TEXT) IN ('10001', '20001') )                                       |
|       +--PhysicalOlapScan[a_table]@0 ( stats=1 )                                                                                     |
+--------------------------------------------------------------------------------------------------------------------------------------+

after optimize:

+--------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                                                                      |
+--------------------------------------------------------------------------------------------------------------------------------------+
| ========== PARSED PLAN (time: 15ms) ==========                                                                                       |
| UnboundResultSink[4] (  )                                                                                                            |
| +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] )                                                                    |
|    +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') )                                                                      |
|       +--LogicalCheckPolicy (  )                                                                                                     |
|          +--UnboundRelation ( id=RelationId#0, nameParts=a_table )                                                                   |
|                                                                                                                                      |
| ========== ANALYZED PLAN (time: 11ms) ==========                                                                                     |
| LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                             |
| +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], excepts=[] )                                                    |
|    +--LogicalFilter[11] ( predicates=k1#0 IN (10001, 20001) )                                                                        |
|       +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET )      |
|                                                                                                                                      |
| ========== REWRITTEN PLAN (time: 12ms) ==========                                                                                    |
| LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                             |
| +--LogicalFilter[43] ( predicates=k1#0 IN (10001, 20001) )                                                                           |
|    +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) |
|                                                                                                                                      |
| ========== OPTIMIZED PLAN (time: 4ms) ==========                                                                                     |
| PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] )                                                                            |
| +--PhysicalDistribute[87]@1 ( stats=0, distributionSpec=DistributionSpecGather )                                                     |
|    +--PhysicalFilter[84]@1 ( stats=0, predicates=k1#0 IN (10001, 20001) )                                                            |
|       +--PhysicalOlapScan[a_table]@0 ( stats=2 )                                                                                     |
+--------------------------------------------------------------------------------------------------------------------------------------+

mongo360 avatar May 22 '24 07:05 mongo360

Thank you for your contribution to Apache Doris. Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website. See Doris Document.

doris-robot avatar May 22 '24 07:05 doris-robot

run buildall

morrySnow avatar May 23 '24 04:05 morrySnow

TPC-H: Total hot run time: 41943 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 570607367f6ddd1b81c6cf9f5d267f8e299608f0, data reload: false

------ Round 1 ----------------------------------
q1	17604	4351	4234	4234
q2	2014	186	192	186
q3	10543	1200	1256	1200
q4	10201	791	758	758
q5	7493	2755	2669	2669
q6	219	139	139	139
q7	973	622	637	622
q8	9219	2162	2110	2110
q9	9449	6714	6747	6714
q10	9079	3878	3872	3872
q11	448	242	236	236
q12	527	237	227	227
q13	17442	3208	3207	3207
q14	261	207	216	207
q15	524	460	464	460
q16	508	395	392	392
q17	1001	666	785	666
q18	8336	7927	7959	7927
q19	7234	1591	1575	1575
q20	637	319	313	313
q21	5237	4004	3953	3953
q22	364	294	276	276
Total cold run time: 119313 ms
Total hot run time: 41943 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4537	4381	4467	4381
q2	384	274	259	259
q3	3170	2962	2728	2728
q4	1859	1642	1656	1642
q5	5478	5479	5496	5479
q6	219	127	127	127
q7	2145	1825	1834	1825
q8	3231	3386	3388	3386
q9	8685	8688	8636	8636
q10	3942	3873	3857	3857
q11	573	492	491	491
q12	793	638	663	638
q13	16222	3145	3188	3145
q14	285	272	250	250
q15	523	464	474	464
q16	475	412	417	412
q17	1759	1509	1514	1509
q18	7645	7750	7560	7560
q19	1674	1528	1518	1518
q20	1995	1800	1795	1795
q21	5032	4698	4668	4668
q22	551	490	502	490
Total cold run time: 71177 ms
Total hot run time: 55260 ms

doris-robot avatar May 23 '24 05:05 doris-robot

TPC-DS: Total hot run time: 168073 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 570607367f6ddd1b81c6cf9f5d267f8e299608f0, data reload: false

query1	899	378	371	371
query2	6432	2502	2409	2409
query3	6647	207	211	207
query4	19687	17258	17324	17258
query5	4114	410	428	410
query6	239	159	147	147
query7	4579	296	295	295
query8	240	194	180	180
query9	8609	2364	2349	2349
query10	455	281	264	264
query11	10758	10069	10127	10069
query12	129	94	86	86
query13	1640	352	362	352
query14	10110	5941	7523	5941
query15	207	167	171	167
query16	7768	263	268	263
query17	1713	516	516	516
query18	1947	269	265	265
query19	201	152	154	152
query20	88	86	99	86
query21	197	137	127	127
query22	4113	4169	3931	3931
query23	33668	33099	33144	33099
query24	9476	2895	2754	2754
query25	565	354	361	354
query26	725	163	159	159
query27	2205	329	332	329
query28	5609	2070	2054	2054
query29	849	599	600	599
query30	251	171	176	171
query31	967	758	751	751
query32	84	55	52	52
query33	633	254	264	254
query34	849	474	479	474
query35	711	624	618	618
query36	1053	938	891	891
query37	108	72	72	72
query38	2953	2741	2771	2741
query39	847	787	797	787
query40	206	127	128	127
query41	50	46	47	46
query42	102	96	105	96
query43	594	553	561	553
query44	1058	727	730	727
query45	183	170	167	167
query46	1072	746	729	729
query47	1840	1784	1771	1771
query48	364	291	298	291
query49	830	369	390	369
query50	765	382	394	382
query51	6829	6821	6705	6705
query52	101	91	91	91
query53	349	291	276	276
query54	726	414	419	414
query55	74	71	76	71
query56	257	237	235	235
query57	1106	1064	1055	1055
query58	231	207	206	206
query59	3545	3121	3183	3121
query60	284	253	252	252
query61	87	86	85	85
query62	591	439	438	438
query63	306	289	285	285
query64	8447	2227	1769	1769
query65	3165	3077	3167	3077
query66	784	321	325	321
query67	15200	14679	14775	14679
query68	4525	539	525	525
query69	439	262	269	262
query70	1141	1149	1167	1149
query71	359	268	270	268
query72	7259	2678	2531	2531
query73	705	321	321	321
query74	6039	5584	5626	5584
query75	3256	2634	2633	2633
query76	2270	994	956	956
query77	372	263	261	261
query78	10452	9803	9641	9641
query79	2403	523	516	516
query80	1133	456	455	455
query81	542	243	247	243
query82	952	97	100	97
query83	257	186	179	179
query84	247	90	87	87
query85	1326	275	259	259
query86	461	330	331	330
query87	3449	3113	3150	3113
query88	4006	2339	2326	2326
query89	470	389	388	388
query90	1960	190	189	189
query91	123	100	95	95
query92	60	47	46	46
query93	1511	513	503	503
query94	1274	186	186	186
query95	405	314	302	302
query96	576	267	266	266
query97	3118	2983	3010	2983
query98	239	225	214	214
query99	1159	862	823	823
Total cold run time: 263959 ms
Total hot run time: 168073 ms

doris-robot avatar May 23 '24 05:05 doris-robot

ClickBench: Total hot run time: 30.51 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 570607367f6ddd1b81c6cf9f5d267f8e299608f0, data reload: false

query1	0.04	0.03	0.04
query2	0.08	0.03	0.04
query3	0.22	0.05	0.05
query4	1.68	0.08	0.09
query5	0.55	0.48	0.54
query6	1.13	0.72	0.72
query7	0.02	0.01	0.01
query8	0.05	0.03	0.04
query9	0.54	0.49	0.48
query10	0.53	0.55	0.55
query11	0.15	0.11	0.12
query12	0.15	0.12	0.13
query13	0.60	0.59	0.59
query14	0.74	0.81	0.78
query15	0.82	0.82	0.80
query16	0.37	0.37	0.37
query17	0.95	1.02	1.03
query18	0.21	0.22	0.26
query19	1.76	1.68	1.64
query20	0.01	0.01	0.01
query21	15.43	0.72	0.70
query22	4.64	6.90	1.95
query23	18.28	1.34	1.33
query24	1.44	0.34	0.20
query25	0.16	0.08	0.08
query26	0.25	0.16	0.16
query27	0.07	0.08	0.08
query28	13.45	1.04	0.99
query29	12.73	3.31	3.21
query30	0.24	0.06	0.06
query31	2.88	0.40	0.38
query32	3.29	0.47	0.45
query33	2.91	2.92	2.86
query34	17.14	4.41	4.44
query35	4.50	4.51	4.48
query36	0.67	0.49	0.46
query37	0.18	0.16	0.15
query38	0.15	0.15	0.14
query39	0.04	0.04	0.04
query40	0.16	0.14	0.15
query41	0.09	0.05	0.04
query42	0.05	0.05	0.05
query43	0.04	0.03	0.03
Total cold run time: 109.39 s
Total hot run time: 30.51 s

doris-robot avatar May 23 '24 05:05 doris-robot

PR approved by at least one committer and no changes requested.

github-actions[bot] avatar May 23 '24 05:05 github-actions[bot]

PR approved by anyone and no changes requested.

github-actions[bot] avatar May 23 '24 05:05 github-actions[bot]

run p0

morrySnow avatar May 23 '24 06:05 morrySnow