trino
trino copied to clipboard
Support ANALYZE stats composed of expressions
A connector may ask engine to collect anything defined by ColumnStatisticType SPI enum. This is convenient, but sometimes a connector needs to provide its own way of calculating statistics.
For example, Iceberg statistics include
apache-datasketches-theta-v1blob typeA serialized form of a "compact" Theta sketch produced by the Apache DataSketches library. The sketch is obtained by constructing Alpha family sketch with default seed, and feeding it with individual distinct values converted to bytes using Iceberg's single-value serialization.
This has two components which are not supported today
- a new data sketch with a specific configuration (so that results can be shared with different query engines)
- a well-defined input pre-processing, which relies on existing Iceberg concepts, which are alien to Trino engine
This PR addresses the second limitation, building on top of https://github.com/trinodb/trino/pull/14220