pinot icon indicating copy to clipboard operation
pinot copied to clipboard

[Query Engine] ANY VALUE Support

Open praveenc7 opened this issue 4 months ago • 4 comments

Overview

The ANY_VALUE aggregation function returns any arbitrary value from the specified column for each group in a GROUP BY query. This function is particularly useful for optimizing queries where you need to include a column in the SELECT clause that has a 1:1 mapping with the GROUP BY columns, without forcing that column into the GROUP BY clause.

Syntax

ANY_VALUE(column_name)

Use Cases

Primary Use Case: Avoiding Unnecessary GROUP BY Columns

Standard SQL requires that all non-aggregation expressions in the SELECT clause must also be part of the GROUP BY clause. However, there are scenarios where:

  1. Functional Dependency: A column has a 1:1 mapping with the grouped columns
  2. Performance Optimization: Adding the column to GROUP BY would increase computation without changing the logical result
  3. Query Simplification: You want to display related data without affecting grouping behavior

Example Scenario

Consider an ORDERS table with CustomerID, CustomerName, and OrderValue:

Without ANY_VALUE (Standard SQL):

-- Option 1: Add CustomerName to GROUP BY (more computation)
SELECT CustomerID, CustomerName, SUM(OrderValue)
FROM Orders 
GROUP BY CustomerID, CustomerName;

-- Option 2: Remove CustomerName from SELECT (lose information)
SELECT CustomerID, SUM(OrderValue)
FROM Orders 
GROUP BY CustomerID;

With ANY_VALUE (Optimized):

-- Get CustomerName without adding to GROUP BY
SELECT CustomerID, ANY_VALUE(CustomerName), SUM(OrderValue)
FROM Orders 
GROUP BY CustomerID;

Behavior

  • Returns: Any arbitrary value from the column for each group
  • Null Handling: Supports both null-handling enabled and disabled modes
  • Data Types: Works with all Pinot data types (INT, LONG, FLOAT, DOUBLE, STRING, BYTES, BIG_DECIMAL)
  • Performance: More efficient than adding the column to GROUP BY when there's a 1:1 mapping

Perf testing

Without Any-value

Example : Query latency without any-value for the above query 3000ms Screenshot 2025-09-01 at 9 23 23 PM Screenshot 2025-09-01 at 9 43 54 PM Screenshot 2025-09-01 at 9 49 06 PM

With Any-value

Example : Query latency with any-value ~ 700ms

Screenshot 2025-09-01 at 9 24 02 PM Screenshot 2025-09-01 at 9 44 22 PM Screenshot 2025-09-01 at 9 49 51 PM

praveenc7 avatar Aug 24 '25 00:08 praveenc7