starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Enhancement] Support array_sort with lambda comparator

Open satanson opened this issue 2 weeks ago โ€ข 6 comments

Why I'm doing:

Support array_sort with lambda defines binary comparator, for examples:

SELECT array_sort([3, 2, 5, 1, 2], (x, y) -> IF(x < y, 1, IF(x = y, 0, -1)));
-- result:
[5,3,2,2,1]
-- !result

SELECT array_sort([[2, 3, 1], [4, 2, 1, 4], [1, 2]], (x, y) -> IF(cardinality(x) < cardinality(y), -1, IF(cardinality(x) = cardinality(y), 0, 1)));
-- result:
[[1,2],[2,3,1],[4,2,1,4]]
-- !result

The lambda function's return type should be boolean or numeric type;

  1. for boolean: f(a,b)=true means a < b; f(a,b)=false means a < b is not hold;
  2. for numeric type: f(a,b) < 0 means a < b; f(a,b)>=0 means a < b is not hold;

The lambda function should define strict weak ordering relation on array element, the internal sort algorithm depends on strict weak ordering, if the lambda function does not satisfiy strict weaking ordering, sort algorithm(std::sort, std::stable_sort, pdqsort) would panic.

A strict weak ordering relation means strict partial ordering relaton < on set S satisfies transitivity of incomparability, following axioms are hold on (S, <). image

For examples:

  1. (x,y)->-1;
  2. (x,y)->0;
  3. (x,y)->rand()-0.5;

FE is not able to judge if all cases of lambda comparators are legal, however, it would report error to user if the lambda contains non-deterministic functions, or the lambda expr does not only depend on its argument.

What I'm doing:

Fixes #issue

What type of PR is this:

  • [ ] BugFix
  • [ ] Feature
  • [x] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [x] I have added test cases for my bug fix or my new feature
  • [ ] This pr needs user documentation (for new or modified features or behaviors)
    • [ ] I have added documentation for my new feature or new function
  • [ ] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [x] 4.0
    • [ ] 3.5
    • [ ] 3.4
    • [ ] 3.3

[!NOTE] Introduce lambda-comparator array_sort (array_sort_lambda) with full FE/BE support, validation, typing, docs, and tests.

  • SQL/Functionality:
    • Add array_sort with lambda comparator via new array_sort_lambda function signature and rewrite from array_sort(arr, (x,y)->...).
  • Backend (BE):
    • Implement ArraySortLambdaExpr to sort arrays using a lambda comparator; integrate into expr factory and execution.
    • Enhance LambdaFunction with constant-evaluation helpers; propagate is_nondeterministic flag through Expr and Thrift.
  • Frontend (FE)/Analyzer:
    • Recognize/rewrite lambda form to array_sort_lambda; enforce comparator legality (depends only on x,y; boolean or numeric; no nondeterminism).
    • Update decimal/type inference and polymorphic handling for ARRAY_SORT_LAMBDA.
  • Optimizer/Planner:
    • Add ScalarOperator.asStream; validate/transform numeric comparators (< 0) in ImplicitCastRule.
    • Avoid CSE extraction in array_sort_lambda/array_map when non-deterministic.
  • Docs:
    • Update array_sort docs (EN/JA/ZH) with lambda comparator syntax, rules, and examples.
  • Tests:
    • Add analyzer/plan UTs and SQL regressions covering lambda sorting, typing, and error cases.
  • Misc:
    • Register array_sort_lambda in builtins; expose new thrift field; wire CMake.

Written by Cursor Bugbot for commit f9290997742699a56bccb41fb8434d76868a49de. This will update automatically on new commits. Configure here.

satanson avatar Dec 11 '25 05:12 satanson

๐Ÿงช CI Insights

Here's what we observed from your CI run for f9290997.

๐ŸŸข All jobs passed!

But CI Insights is watching ๐Ÿ‘€

mergify[bot] avatar Dec 11 '25 05:12 mergify[bot]

@cursor review

alvin-celerdata avatar Dec 11 '25 15:12 alvin-celerdata

[Java-Extensions Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Dec 12 '25 09:12 github-actions[bot]

[FE Incremental Coverage Report]

:white_check_mark: pass : 78 / 80 (97.50%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: com/starrocks/sql/optimizer/rewrite/scalar/ImplicitCastRule.java 22 23 95.65% [122]
:large_blue_circle: com/starrocks/sql/analyzer/ExpressionAnalyzer.java 42 43 97.67% [315]
:large_blue_circle: com/starrocks/sql/optimizer/operator/scalar/ScalarOperator.java 1 1 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/rewrite/scalar/FoldConstantsRule.java 1 1 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/rule/tree/exprreuse/ScalarOperatorsReuse.java 8 8 100.00% []
:large_blue_circle: com/starrocks/planner/expression/ExprToThrift.java 2 2 100.00% []
:large_blue_circle: com/starrocks/sql/analyzer/PolymorphicFunctionAnalyzer.java 2 2 100.00% []

github-actions[bot] avatar Dec 12 '25 09:12 github-actions[bot]

[BE Incremental Coverage Report]

:x: fail : 130 / 203 (64.04%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: be/src/exprs/array_sort_lambda_expr.h 0 2 00.00% [26, 39]
:large_blue_circle: be/src/exprs/array_functions.cpp 0 2 00.00% [1098, 1100]
:large_blue_circle: be/src/exprs/lambda_function.cpp 3 15 20.00% [116, 117, 118, 119, 120, 123, 124, 125, 126, 127, 129, 130]
:large_blue_circle: be/src/exprs/array_sort_lambda_expr.cpp 123 180 68.33% [44, 63, 82, 89, 111, 112, 137, 143, 144, 184, 188, 189, 210, 211, 212, 213, 214, 215, 223, 224, 225, 228, 229, 230, 232, 233, 243, 244, 245, 246, 247, 268, 269, 270, 271, 280, 281, 314, 327, 328, 330, 334, 341, 342, 343, 344, 345, 347, 348, 349, 351, 352, 353, 354, 355, 356, 358]
:large_blue_circle: be/src/exprs/expr.cpp 4 4 100.00% []

github-actions[bot] avatar Dec 12 '25 10:12 github-actions[bot]

@Mergifyio backport branch-4.0

github-actions[bot] avatar Dec 15 '25 02:12 github-actions[bot]

backport branch-4.0

โœ… Backports have been created

mergify[bot] avatar Dec 15 '25 02:12 mergify[bot]