starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Enhancement] Support tokenize function

Open dujijun007 opened this issue 9 months ago • 5 comments

Why I'm doing:

The different results of tokenization provided by various tokenizers are too vague to users, so we need a tokenize function to allow users to figure it out easily.

What I'm doing:

Support a tokenize function, like tokenize(<tokenizer_name>, <content>)

Fixes #45145

What type of PR is this:

  • [ ] BugFix
  • [ ] Feature
  • [x] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [x] I have added test cases for my bug fix or my new feature
  • [ ] This pr needs user documentation (for new or modified features or behaviors)
    • [ ] I have added documentation for my new feature or new function
  • [ ] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [ ] 3.3
    • [ ] 3.2
    • [ ] 3.1
    • [ ] 3.0
    • [ ] 2.5

dujijun007 avatar May 06 '24 09:05 dujijun007

@dujijun007 thank you for the contribution, could you create an issue to describe this new function? About its interface, input, output and limits?

imay avatar May 06 '24 12:05 imay

@dujijun007 thank you for the contribution, could you create an issue to describe this new function? About its interface, input, output and limits?

@imay ok, link it here(#45145)

dujijun007 avatar May 06 '24 15:05 dujijun007

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

sonarqubecloud[bot] avatar May 10 '24 07:05 sonarqubecloud[bot]

[FE Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar May 10 '24 09:05 github-actions[bot]

[BE Incremental Coverage Report]

:white_check_mark: pass : 55 / 57 (96.49%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: be/src/exprs/gin_functions.cpp 55 57 96.49% [71, 91]

github-actions[bot] avatar May 10 '24 09:05 github-actions[bot]