iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Snapshot-Level Metrics and Statistics

Open omervk opened this issue 7 years ago • 0 comments

Assume a table with the following field:

id int

There are n data files, each of which has the statistics min(id) and max(id). ids are positive integers.

Querying by id < 0 would require an O(n) run on all data files in the manifest, querying whether min(id) < 0 < max(id).

Aggregating metrics and/or statistics to the Snapshot level would reduce such scans from O(n) (n being the number of data files) to O(1).

omervk avatar Nov 13 '18 22:11 omervk