[Query Engine] ANY VALUE Support
Overview
The ANY_VALUE aggregation function returns any arbitrary value from the specified column for each group in a GROUP BY query. This function is particularly useful for optimizing queries where you need to include a column in the SELECT clause that has a 1:1 mapping with the GROUP BY columns, without forcing that column into the GROUP BY clause.
Syntax
ANY_VALUE(column_name)
Use Cases
Primary Use Case: Avoiding Unnecessary GROUP BY Columns
Standard SQL requires that all non-aggregation expressions in the SELECT clause must also be part of the GROUP BY clause. However, there are scenarios where:
- Functional Dependency: A column has a 1:1 mapping with the grouped columns
-
Performance Optimization: Adding the column to
GROUP BYwould increase computation without changing the logical result - Query Simplification: You want to display related data without affecting grouping behavior
Example Scenario
Consider an ORDERS table with CustomerID, CustomerName, and OrderValue:
Without ANY_VALUE (Standard SQL):
-- Option 1: Add CustomerName to GROUP BY (more computation)
SELECT CustomerID, CustomerName, SUM(OrderValue)
FROM Orders
GROUP BY CustomerID, CustomerName;
-- Option 2: Remove CustomerName from SELECT (lose information)
SELECT CustomerID, SUM(OrderValue)
FROM Orders
GROUP BY CustomerID;
With ANY_VALUE (Optimized):
-- Get CustomerName without adding to GROUP BY
SELECT CustomerID, ANY_VALUE(CustomerName), SUM(OrderValue)
FROM Orders
GROUP BY CustomerID;
Behavior
- Returns: Any arbitrary value from the column for each group
- Null Handling: Supports both null-handling enabled and disabled modes
- Data Types: Works with all Pinot data types (INT, LONG, FLOAT, DOUBLE, STRING, BYTES, BIG_DECIMAL)
-
Performance: More efficient than adding the column to
GROUP BYwhen there's a 1:1 mapping
Perf testing
Without Any-value
Example : Query latency without any-value for the above query 3000ms
With Any-value
Example : Query latency with any-value ~ 700ms