[Query Engine] ANY VALUE Support
Overview
The ANY_VALUE aggregation function returns any arbitrary value from the specified column for each group in a GROUP BY query. This function is particularly useful for optimizing queries where you need to include a column in the SELECT clause that has a 1:1 mapping with the GROUP BY columns, without forcing that column into the GROUP BY clause.
Syntax
ANY_VALUE(column_name)
Use Cases
Primary Use Case: Avoiding Unnecessary GROUP BY Columns
Standard SQL requires that all non-aggregation expressions in the SELECT clause must also be part of the GROUP BY clause. However, there are scenarios where:
- Functional Dependency: A column has a 1:1 mapping with the grouped columns
-
Performance Optimization: Adding the column to
GROUP BYwould increase computation without changing the logical result - Query Simplification: You want to display related data without affecting grouping behavior
Example Scenario
Consider an ORDERS table with CustomerID, CustomerName, and OrderValue:
Without ANY_VALUE (Standard SQL):
-- Option 1: Add CustomerName to GROUP BY (more computation)
SELECT CustomerID, CustomerName, SUM(OrderValue)
FROM Orders
GROUP BY CustomerID, CustomerName;
-- Option 2: Remove CustomerName from SELECT (lose information)
SELECT CustomerID, SUM(OrderValue)
FROM Orders
GROUP BY CustomerID;
With ANY_VALUE (Optimized):
-- Get CustomerName without adding to GROUP BY
SELECT CustomerID, ANY_VALUE(CustomerName), SUM(OrderValue)
FROM Orders
GROUP BY CustomerID;
Behavior
- Returns: Any arbitrary value from the column for each group
- Null Handling: Supports both null-handling enabled and disabled modes
- Data Types: Works with all Pinot data types (INT, LONG, FLOAT, DOUBLE, STRING, BYTES, BIG_DECIMAL)
-
Performance: More efficient than adding the column to
GROUP BYwhen there's a 1:1 mapping
Perf testing
Without Any-value
Example : Query latency without any-value for the above query 3000ms
With Any-value
Example : Query latency with any-value ~ 700ms
Codecov Report
:x: Patch coverage is 30.40541% with 103 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 63.26%. Comparing base (95d43c0) to head (a6b4666).
:warning: Report is 9 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #16678 +/- ##
============================================
- Coverage 63.28% 63.26% -0.03%
Complexity 1474 1474
============================================
Files 3154 3155 +1
Lines 188007 188088 +81
Branches 28782 28784 +2
============================================
+ Hits 118977 118990 +13
- Misses 59807 59891 +84
+ Partials 9223 9207 -16
| Flag | Coverage Δ | |
|---|---|---|
| custom-integration1 | 100.00% <ø> (ø) |
|
| integration | 100.00% <ø> (ø) |
|
| integration1 | 100.00% <ø> (ø) |
|
| integration2 | 0.00% <ø> (ø) |
|
| java-11 | 63.23% <30.40%> (-0.04%) |
:arrow_down: |
| java-21 | 63.21% <30.40%> (-0.01%) |
:arrow_down: |
| temurin | 63.26% <30.40%> (-0.03%) |
:arrow_down: |
| unittests | 63.25% <30.40%> (-0.03%) |
:arrow_down: |
| unittests1 | 55.69% <30.40%> (+0.01%) |
:arrow_up: |
| unittests2 | 33.88% <0.00%> (-0.05%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
@praveenc7 : could we target end of month for getting this merged?
@praveenc7 : could we target end of month for getting this merged?
@ankitsultana Was side-tracked by other issue, planning to open for review next week and we can target this by end of month
@ankitsultana appreciate, If you could review this?
@xiangfu0 / @somandal Can you help merge as well.