trino icon indicating copy to clipboard operation
trino copied to clipboard

Support ANALYZE stats composed of expressions

Open findepi opened this issue 3 years ago • 0 comments

A connector may ask engine to collect anything defined by ColumnStatisticType SPI enum. This is convenient, but sometimes a connector needs to provide its own way of calculating statistics.

For example, Iceberg statistics include

apache-datasketches-theta-v1 blob type

A serialized form of a "compact" Theta sketch produced by the Apache DataSketches library. The sketch is obtained by constructing Alpha family sketch with default seed, and feeding it with individual distinct values converted to bytes using Iceberg's single-value serialization.

This has two components which are not supported today

  • a new data sketch with a specific configuration (so that results can be shared with different query engines)
  • a well-defined input pre-processing, which relies on existing Iceberg concepts, which are alien to Trino engine

This PR addresses the second limitation, building on top of https://github.com/trinodb/trino/pull/14220

findepi avatar Sep 20 '22 21:09 findepi