starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Feature] Support regexp_split function

Open 839224346 opened this issue 1 year ago • 5 comments

Why I'm doing:

What I'm doing:

Support split_by_regexp function, compatible with CK's split_by_regexp and Spark's split function

Fix https://github.com/StarRocks/starrocks/issues/37089

Which issues of this PR fixes :

Partially completes regexp_split function in: https://github.com/StarRocks/starrocks/issues/37089

Another pr will be submitted to convert regexp_split to split_by_regexp function when it is in Trino Mode after this pr being merged.

What type of PR is this:

  • [ ] BugFix
  • [x] Feature
  • [ ] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [x] I have added test cases for my bug fix or my new feature
  • [x] This pr needs user documentation (for new or modified features or behaviors)
  • [ ] I have added documentation for my new feature or new function
  • [ ] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [x] 3.3
    • [x] 3.2
    • [x] 3.1
    • [x] 3.0
    • [ ] 2.5

839224346 avatar Jun 19 '24 06:06 839224346

  • could you pls add a document? thx!
  • and for the function's name, starrocks has: regexp, regexp_extract, regexp_extract_all, regexp_replace, so I think this function is better named regexp_split.
  • also, is this function doing the same job as trino's regexp_split? is there any further transformation need to be compatible with trino in trino's dialect?

wangsimo0 avatar Jun 21 '24 08:06 wangsimo0

[FE Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Jun 28 '24 10:06 github-actions[bot]

[BE Incremental Coverage Report]

:white_check_mark: pass : 159 / 179 (88.83%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: be/src/exprs/regexp_split.cpp 38 50 76.00% [41, 42, 70, 74, 76, 77, 78, 79, 80, 84, 85, 86]
:large_blue_circle: be/src/exprs/string_functions.cpp 120 128 93.75% [3782, 3793, 3808, 3842, 3884, 3885, 3886, 3887]
:large_blue_circle: be/src/exprs/regexp_split.h 1 1 100.00% []

github-actions[bot] avatar Jun 28 '24 10:06 github-actions[bot]

  • could you pls add a document? thx!
  • and for the function's name, starrocks has: regexp, regexp_extract, regexp_extract_all, regexp_replace, so I think this function is better named regexp_split.
  • also, is this function doing the same job as trino's regexp_split? is there any further transformation need to be compatible with trino in trino's dialect?

@wangsimo0 done. The function should be compatible with Trino's regexp_split and no further transformation is required.

839224346 avatar Jun 28 '24 11:06 839224346