pinot icon indicating copy to clipboard operation
pinot copied to clipboard

[Query Engine] ANY VALUE Support

Open praveenc7 opened this issue 8 months ago • 4 comments

Overview

The ANY_VALUE aggregation function returns any arbitrary value from the specified column for each group in a GROUP BY query. This function is particularly useful for optimizing queries where you need to include a column in the SELECT clause that has a 1:1 mapping with the GROUP BY columns, without forcing that column into the GROUP BY clause.

Syntax

ANY_VALUE(column_name)

Use Cases

Primary Use Case: Avoiding Unnecessary GROUP BY Columns

Standard SQL requires that all non-aggregation expressions in the SELECT clause must also be part of the GROUP BY clause. However, there are scenarios where:

  1. Functional Dependency: A column has a 1:1 mapping with the grouped columns
  2. Performance Optimization: Adding the column to GROUP BY would increase computation without changing the logical result
  3. Query Simplification: You want to display related data without affecting grouping behavior

Example Scenario

Consider an ORDERS table with CustomerID, CustomerName, and OrderValue:

Without ANY_VALUE (Standard SQL):

-- Option 1: Add CustomerName to GROUP BY (more computation)
SELECT CustomerID, CustomerName, SUM(OrderValue)
FROM Orders 
GROUP BY CustomerID, CustomerName;

-- Option 2: Remove CustomerName from SELECT (lose information)
SELECT CustomerID, SUM(OrderValue)
FROM Orders 
GROUP BY CustomerID;

With ANY_VALUE (Optimized):

-- Get CustomerName without adding to GROUP BY
SELECT CustomerID, ANY_VALUE(CustomerName), SUM(OrderValue)
FROM Orders 
GROUP BY CustomerID;

Behavior

  • Returns: Any arbitrary value from the column for each group
  • Null Handling: Supports both null-handling enabled and disabled modes
  • Data Types: Works with all Pinot data types (INT, LONG, FLOAT, DOUBLE, STRING, BYTES, BIG_DECIMAL)
  • Performance: More efficient than adding the column to GROUP BY when there's a 1:1 mapping

Perf testing

Without Any-value

Example : Query latency without any-value for the above query 3000ms Screenshot 2025-09-01 at 9 23 23 PM Screenshot 2025-09-01 at 9 43 54 PM Screenshot 2025-09-01 at 9 49 06 PM

With Any-value

Example : Query latency with any-value ~ 700ms

Screenshot 2025-09-01 at 9 24 02 PM Screenshot 2025-09-01 at 9 44 22 PM Screenshot 2025-09-01 at 9 49 51 PM

praveenc7 avatar Aug 24 '25 00:08 praveenc7

Codecov Report

:x: Patch coverage is 30.40541% with 103 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 63.26%. Comparing base (95d43c0) to head (a6b4666). :warning: Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
...regation/function/AnyValueAggregationFunction.java 31.38% 89 Missing and 5 partials :warning:
...erator/blocks/results/AggregationResultsBlock.java 0.00% 6 Missing :warning:
...aggregation/function/AggregationFunctionUtils.java 0.00% 3 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #16678      +/-   ##
============================================
- Coverage     63.28%   63.26%   -0.03%     
  Complexity     1474     1474              
============================================
  Files          3154     3155       +1     
  Lines        188007   188088      +81     
  Branches      28782    28784       +2     
============================================
+ Hits         118977   118990      +13     
- Misses        59807    59891      +84     
+ Partials       9223     9207      -16     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.23% <30.40%> (-0.04%) :arrow_down:
java-21 63.21% <30.40%> (-0.01%) :arrow_down:
temurin 63.26% <30.40%> (-0.03%) :arrow_down:
unittests 63.25% <30.40%> (-0.03%) :arrow_down:
unittests1 55.69% <30.40%> (+0.01%) :arrow_up:
unittests2 33.88% <0.00%> (-0.05%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov-commenter avatar Aug 25 '25 01:08 codecov-commenter

@praveenc7 : could we target end of month for getting this merged?

ankitsultana avatar Sep 20 '25 04:09 ankitsultana

@praveenc7 : could we target end of month for getting this merged?

@ankitsultana Was side-tracked by other issue, planning to open for review next week and we can target this by end of month

praveenc7 avatar Sep 26 '25 20:09 praveenc7

@ankitsultana appreciate, If you could review this?

praveenc7 avatar Sep 30 '25 05:09 praveenc7

@xiangfu0 / @somandal Can you help merge as well.

praveenc7 avatar Dec 24 '25 19:12 praveenc7