delta-rs icon indicating copy to clipboard operation
delta-rs copied to clipboard

Support bloom filter table indexes

Open houqp opened this issue 3 years ago • 10 comments

Description

Use Case

This could help speed up table scans. This feature is not documented in the official spec yet, see more details in https://docs.databricks.com/delta/optimizations/bloom-filters.html.

houqp avatar May 05 '21 06:05 houqp

Any update on this? I would really love to see an ability to do filter on the tables.

kesavkolla avatar Nov 17 '21 08:11 kesavkolla

Databricks still hasn't open-sourced this feature I believe.

houqp avatar Nov 18 '21 06:11 houqp

So this is only doable after Databricks open-souces or at least releases the official spec, right?

viirya avatar Nov 18 '21 20:11 viirya

welp, we can always reverse engineer the format if anyone is interested in doing that :D

houqp avatar Nov 18 '21 21:11 houqp

While trying to fix parquet2 builds in an PR, I realized that parquet2 has at least some support for bloom filters. https://github.com/jorgecarleitao/parquet2/tree/main/src/bloom_filter.

Just leaving this here for reference 😄.

roeap avatar Oct 19 '22 20:10 roeap