ClickHouse icon indicating copy to clipboard operation
ClickHouse copied to clipboard

Include bloom filter statistics when reading parquet metadata with ClickHouse

Open Selfeer opened this issue 1 year ago • 1 comments

Describe the new feature

We need a way to determine if the bloom filter is applied or not on a parquet file when inspecting the parquet metadata with ClickHouse via SELECT * FROM file('output.parquet', ParquetMetadata). Currently there is no mention of bloom_filter_offset when reading from a parquet with ClickHouse.

Use case

A way to check if the bloom filter is applied or not on the parquet file and have it as one of the checks for QA directly with ClickHouse without relying on 3rd party tools like parquet-tools.

Selfeer avatar Oct 02 '24 11:10 Selfeer

This might be useful: https://github.com/apache/iceberg/issues/9898#issuecomment-2375857223

arthurpassos avatar Oct 04 '24 13:10 arthurpassos