[fix](function) fix unexpected be core in string search function
Proposed changes
Issue Number: close #31311
Keep same behavior with ck: ck docs
Code: 43, Message: Illegal type String of argument of function multiMatchAny
Before:
CREATE TABLE `tbl2` (
`ka` int(11) NULL,
`kb` text NULL,
`kc` array<text> NULL
) ENGINE=OLAP
UNIQUE KEY(`ka`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`ka`) BUCKETS 5
PROPERTIES (
"replication_allocation" = "tag.location.default: 3",
"in_memory" = "false",
"storage_format" = "V2",
"disable_auto_compaction" = "false"
);
MySQL [test]> select multi_match_any(kb, 'hello') from tbl2 order by col1;
ERROR 1105 (HY000): RpcException, msg: org.apache.doris.rpc.RpcException: io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason
After:
MySQL [test]> select multi_match_any(kb, 'hello') from tbl2 order by col1;
ERROR 1105 (HY000): errCode = 2, detailMessage = No matching function with signature: multi_match_any(text, varchar(-1)).
Further comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...
Thank you for your contribution to Apache Doris. Don't know what should be done next? See How to process your PR
run buildall
TeamCity be ut coverage result: Function Coverage: 35.71% (8547/23936) Line Coverage: 27.53% (69360/251944) Region Coverage: 26.68% (35976/134843) Branch Coverage: 23.48% (18388/78304) Coverage Report: http://coverage.selectdb-in.cc/coverage/a357282735bd5e4a6cff5818bf4c3e1c1156bb54_a357282735bd5e4a6cff5818bf4c3e1c1156bb54/report/index.html
run buildall
PR approved by at least one committer and no changes requested.
PR approved by anyone and no changes requested.
TeamCity be ut coverage result: Function Coverage: 35.71% (8548/23937) Line Coverage: 27.52% (69361/251995) Region Coverage: 26.68% (35971/134848) Branch Coverage: 23.49% (18390/78304) Coverage Report: http://coverage.selectdb-in.cc/coverage/acbf68f97b059210c9db3f42651a7fddb6b3a367_acbf68f97b059210c9db3f42651a7fddb6b3a367/report/index.html
TPC-H: Total hot run time: 40756 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit acbf68f97b059210c9db3f42651a7fddb6b3a367, data reload: false
------ Round 1 ----------------------------------
q1 17695 4942 4796 4796
q2 2052 138 138 138
q3 10576 982 980 980
q4 4646 949 952 949
q5 7627 3128 3142 3128
q6 193 130 134 130
q7 1220 761 756 756
q8 9236 2007 2043 2007
q9 7420 6533 6560 6533
q10 8289 2619 2628 2619
q11 415 203 223 203
q12 784 329 328 328
q13 17927 3643 3626 3626
q14 285 255 258 255
q15 611 532 496 496
q16 468 395 407 395
q17 906 856 858 856
q18 7417 6644 6546 6546
q19 1551 1476 1485 1476
q20 544 268 268 268
q21 7062 3954 3936 3936
q22 865 343 335 335
Total cold run time: 107789 ms
Total hot run time: 40756 ms
----- Round 2, with runtime_filter_mode=off -----
q1 4774 4804 4786 4786
q2 292 179 178 178
q3 3572 3562 3573 3562
q4 2472 2473 2494 2473
q5 5721 5719 5687 5687
q6 212 128 129 128
q7 2221 1620 1654 1620
q8 2989 3061 3049 3049
q9 8642 8619 8626 8619
q10 6754 4229 4228 4228
q11 507 379 351 351
q12 769 529 531 529
q13 4481 3406 3390 3390
q14 274 228 227 227
q15 584 510 500 500
q16 482 410 437 410
q17 1605 1573 1565 1565
q18 8296 7619 7624 7619
q19 1631 1629 1619 1619
q20 2102 1804 1816 1804
q21 6459 6057 6076 6057
q22 570 526 519 519
Total cold run time: 65409 ms
Total hot run time: 58920 ms
TPC-DS: Total hot run time: 175568 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit acbf68f97b059210c9db3f42651a7fddb6b3a367, data reload: false
query1 918 352 342 342
query2 6520 1784 1756 1756
query3 6694 208 205 205
query4 23100 21175 21066 21066
query5 4312 374 376 374
query6 261 164 165 164
query7 4625 304 302 302
query8 244 194 193 193
query9 8429 2829 2818 2818
query10 421 227 230 227
query11 15131 14481 14425 14425
query12 140 88 80 80
query13 1699 427 426 426
query14 9244 7583 7631 7583
query15 214 183 195 183
query16 7643 261 250 250
query17 1401 550 523 523
query18 1943 267 257 257
query19 197 143 149 143
query20 84 81 82 81
query21 186 113 113 113
query22 4823 4766 4675 4675
query23 32355 31511 31426 31426
query24 12674 3332 3411 3332
query25 634 367 359 359
query26 1867 155 161 155
query27 3056 316 310 310
query28 6597 1835 1807 1807
query29 1096 612 612 612
query30 276 137 147 137
query31 940 741 754 741
query32 94 57 58 57
query33 723 238 238 238
query34 1082 484 492 484
query35 936 821 799 799
query36 1003 895 887 887
query37 183 60 65 60
query38 3284 3113 3193 3113
query39 1357 1328 1325 1325
query40 280 104 99 99
query41 37 35 34 34
query42 100 102 98 98
query43 472 451 461 451
query44 1087 705 700 700
query45 194 187 182 182
query46 1022 785 756 756
query47 1599 1577 1568 1568
query48 428 339 347 339
query49 1218 296 306 296
query50 758 372 367 367
query51 4535 4289 4268 4268
query52 108 95 97 95
query53 393 296 296 296
query54 292 232 233 232
query55 86 79 82 79
query56 225 210 195 195
query57 1093 936 954 936
query58 212 197 192 192
query59 2364 2318 2108 2108
query60 243 221 227 221
query61 81 82 81 81
query62 594 392 366 366
query63 319 284 298 284
query64 6204 3042 3102 3042
query65 3295 3266 3244 3244
query66 1334 335 323 323
query67 14311 14140 13792 13792
query68 5067 550 567 550
query69 519 364 346 346
query70 1288 1233 1238 1233
query71 452 253 260 253
query72 6313 2792 2568 2568
query73 690 316 316 316
query74 6865 6369 6477 6369
query75 3208 2563 2542 2542
query76 3266 1100 1186 1100
query77 344 240 239 239
query78 9425 8742 8657 8657
query79 968 499 493 493
query80 534 369 351 351
query81 441 204 202 202
query82 163 83 86 83
query83 135 122 120 120
query84 232 77 78 77
query85 1005 355 337 337
query86 292 320 308 308
query87 3485 3299 3285 3285
query88 2743 2288 2292 2288
query89 441 369 374 369
query90 1902 166 163 163
query91 153 126 125 125
query92 53 46 47 46
query93 982 520 498 498
query94 1093 174 174 174
query95 425 336 344 336
query96 574 262 264 262
query97 4396 4271 4262 4262
query98 217 203 199 199
query99 1065 745 759 745
Total cold run time: 268006 ms
Total hot run time: 175568 ms
ClickBench: Total hot run time: 31.35 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit acbf68f97b059210c9db3f42651a7fddb6b3a367, data reload: false
query1 0.03 0.02 0.02
query2 0.06 0.02 0.02
query3 0.24 0.08 0.07
query4 1.65 0.08 0.08
query5 0.48 0.48 0.48
query6 1.37 0.61 0.61
query7 0.02 0.01 0.01
query8 0.04 0.02 0.03
query9 0.53 0.44 0.44
query10 0.47 0.48 0.48
query11 0.12 0.10 0.09
query12 0.11 0.10 0.10
query13 0.58 0.59 0.58
query14 0.76 0.78 0.80
query15 0.82 0.78 0.79
query16 0.32 0.33 0.33
query17 0.85 0.90 0.90
query18 0.17 0.17 0.18
query19 1.79 1.70 1.60
query20 0.02 0.01 0.01
query21 15.41 0.68 0.60
query22 3.13 5.04 2.88
query23 17.73 1.01 0.96
query24 1.96 0.41 0.37
query25 0.63 0.06 0.06
query26 0.16 0.14 0.13
query27 0.05 0.06 0.05
query28 12.21 0.81 0.81
query29 12.50 3.42 3.35
query30 0.55 0.57 0.47
query31 2.78 0.35 0.37
query32 3.36 0.47 0.47
query33 3.14 3.13 3.11
query34 15.35 4.51 4.49
query35 4.50 4.48 4.47
query36 1.07 0.93 0.95
query37 0.07 0.05 0.05
query38 0.03 0.03 0.03
query39 0.02 0.01 0.02
query40 0.16 0.14 0.18
query41 0.07 0.02 0.01
query42 0.02 0.02 0.02
query43 0.03 0.02 0.02
Total cold run time: 105.36 s
Total hot run time: 31.35 s
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Load test result on commit acbf68f97b059210c9db3f42651a7fddb6b3a367 with default session variables
Stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc: 59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select: 16.4 seconds inserted 10000000 Rows, about 609K ops/s
Do we need to throw exception when we meeting NULL as argument ?
For this type of case, CK will return nil instead of throwing an error, and Doris are currently returning 0. Do we need to maintain consistency with CK, as this function was migrated from CK?
Signature of this function in doris suggests that the result is always not nullable, if we changed that, there will be many changes on BE, for example maybe we should set use_default_impl_for_nulls to true.
I think its ok to keep current behavior.
@xy720
need more discuss. hold on now. we may need a complete fix on planner to prohibit
stringtoarray<string>. @morrySnow PTAL
@zclllyybb @morrySnow Do we still need to discuss it? Perhaps you can provide me with an example so that I can fix it in the next PR.
need more discuss. hold on now. we may need a complete fix on planner to prohibit
stringtoarray<string>. @morrySnow PTAL@zclllyybb @morrySnow Do we still need to discuss it? Perhaps you can provide me with an example so that I can fix it in the next PR.
I think the better way is totally forbid implicit casting from string to array<string> when do type coercion for function signature matches, rather than use a special set STRING_SEARCH_FUNCTION_SET to avoid this. could you please handle this? or you can just fix the origin problem and leave it as a individual task
need more discuss. hold on now. we may need a complete fix on planner to prohibit
stringtoarray<string>. @morrySnow PTAL@zclllyybb @morrySnow Do we still need to discuss it? Perhaps you can provide me with an example so that I can fix it in the next PR.
I think the better way is totally forbid implicit casting from
stringtoarray<string>when do type coercion for function signature matches, rather than use a special setSTRING_SEARCH_FUNCTION_SETto avoid this. could you please handle this? or you can just fix the origin problem and leave it as a individual task
Completely forbid implicit casting may affect many behaviors. For example, should we ban the following case?
case: implicit casting in insert stmt
MySQL [test]> desc tbl2;
+-------+-------------+------+-------+---------+---------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-------+---------+---------+
| ka | INT | Yes | true | NULL | |
| kb | TEXT | Yes | false | NULL | REPLACE |
| kc | ARRAY<TEXT> | Yes | false | NULL | REPLACE |
+-------+-------------+------+-------+---------+---------+
3 rows in set (0.01 sec)
MySQL [test]> insert into tbl2 values(1, '123', '[123,123]');
Query OK, 1 row affected (0.07 sec)
{'label':'insert_6e5f2205983f49a7_8767898ea1995eb2', 'status':'VISIBLE', 'txnId':'12019'}
And I think implicitly casting string to other type is a tradition in doris (ex. string->(int, date,jsonb)), but unfortunately not in Clickhouse, if we want to change it, we really need more discuss about it, may be I should leave it as an individual task.
need more discuss. hold on now. we may need a complete fix on planner to prohibit
stringtoarray<string>. @morrySnow PTAL@zclllyybb @morrySnow Do we still need to discuss it? Perhaps you can provide me with an example so that I can fix it in the next PR.
I think the better way is totally forbid implicit casting from
stringtoarray<string>when do type coercion for function signature matches, rather than use a special setSTRING_SEARCH_FUNCTION_SETto avoid this. could you please handle this? or you can just fix the origin problem and leave it as a individual taskCompletely forbid implicit casting may affect many behaviors. For example, should we ban the following case?
case: implicit casting in insert stmt
MySQL [test]> desc tbl2; +-------+-------------+------+-------+---------+---------+ | Field | Type | Null | Key | Default | Extra | +-------+-------------+------+-------+---------+---------+ | ka | INT | Yes | true | NULL | | | kb | TEXT | Yes | false | NULL | REPLACE | | kc | ARRAY<TEXT> | Yes | false | NULL | REPLACE | +-------+-------------+------+-------+---------+---------+ 3 rows in set (0.01 sec) MySQL [test]> insert into tbl2 values(1, '123', '[123,123]'); Query OK, 1 row affected (0.07 sec) {'label':'insert_6e5f2205983f49a7_8767898ea1995eb2', 'status':'VISIBLE', 'txnId':'12019'}And I think implicitly casting string to other type is a tradition in doris (ex. string->(int, date,jsonb)), but unfortunately not in Clickhouse, if we want to change it, we really need more discuss about it, may be I should leave it as an individual task.
- cast
stringto another type is acceptable. but toarray<string>(generally, to all type which have a subtype of string) is a special case. the users may not sure what they are doing, like '[1,2,3]' means ['1','2','3'] or ['[1,2,3]']? for this particular scenario, maybe forbid to implicit cast is a good decision. If they need, could use a more clear way or explicit cast. - for functions' match, we can individually deal some of the type coercion rules. so there still need more discussion about if we should change the scenario you mentioned together.
looking forward for more opinion!
need more discuss. hold on now. we may need a complete fix on planner to prohibit
stringtoarray<string>. @morrySnow PTAL@zclllyybb @morrySnow Do we still need to discuss it? Perhaps you can provide me with an example so that I can fix it in the next PR.
I think the better way is totally forbid implicit casting from
stringtoarray<string>when do type coercion for function signature matches, rather than use a special setSTRING_SEARCH_FUNCTION_SETto avoid this. could you please handle this? or you can just fix the origin problem and leave it as a individual taskCompletely forbid implicit casting may affect many behaviors. For example, should we ban the following case? case: implicit casting in insert stmt
MySQL [test]> desc tbl2; +-------+-------------+------+-------+---------+---------+ | Field | Type | Null | Key | Default | Extra | +-------+-------------+------+-------+---------+---------+ | ka | INT | Yes | true | NULL | | | kb | TEXT | Yes | false | NULL | REPLACE | | kc | ARRAY<TEXT> | Yes | false | NULL | REPLACE | +-------+-------------+------+-------+---------+---------+ 3 rows in set (0.01 sec) MySQL [test]> insert into tbl2 values(1, '123', '[123,123]'); Query OK, 1 row affected (0.07 sec) {'label':'insert_6e5f2205983f49a7_8767898ea1995eb2', 'status':'VISIBLE', 'txnId':'12019'}And I think implicitly casting string to other type is a tradition in doris (ex. string->(int, date,jsonb)), but unfortunately not in Clickhouse, if we want to change it, we really need more discuss about it, may be I should leave it as an individual task.
- cast
stringto another type is acceptable. but toarray<string>(generally, to all type which have a subtype of string) is a special case. the users may not sure what they are doing, like '[1,2,3]' means ['1','2','3'] or ['[1,2,3]']? for this particular scenario, maybe forbid to implicit cast is a good decision. If they need, could use a more clear way or explicit cast.- for functions' match, we can individually deal some of the type coercion rules. so there still need more discussion about if we should change the scenario you mentioned together.
looking forward for more opinion!
Got it. I will try to forbid the case 1 later, that doesn't require a lot of work. For case 2, we can leave it as an individual task to be discussed.
@zclllyybb @xy720 . Cast string to array is ok. We parse it with json format. string '[1,2,3]' cast to array<string> will produce text array with 3 elements: '1', '2' and '3'. It is not ambiguous. All strings that do not meet the required format will be converted to null.
@amorynan , could u provide more details?
@zclllyybb @xy720 . Cast
stringtoarrayis ok. We parse it with json format.string'[1,2,3]'cast toarray<string>will produce text array with 3 elements:'1','2'and'3'. It is not ambiguous. All strings that do not meet the required format will be converted tonull. @amorynan , could u provide more details?
@amorynan, @morrySnow Any more details? Do we already support casting string to array with json format now ?