parquet-go icon indicating copy to clipboard operation
parquet-go copied to clipboard

Bloom filter support?

Open celer opened this issue 4 years ago • 3 comments

Hi,

I was curious if you have any intent on supporting bloom filters:

https://github.com/apache/parquet-format/blob/master/BloomFilter.md

thanks!

celer avatar May 07 '20 02:05 celer

hi, @celer I need some time to investigate it.

xitongsys avatar May 09 '20 07:05 xitongsys

I would also be interested in this and if I understand correctly the schema embedded here already supports adding bloom headers https://github.com/xitongsys/parquet-go/blob/master/parquet/parquet.go#L6012, but I struggle to understand how to store the header in a parquet file. Any pointers here?

Otherwise there is a fairly popular go bloom library already available https://github.com/bits-and-blooms/bloom, which could be added to support this.

johanneswuerbach avatar Feb 28 '22 23:02 johanneswuerbach

I've created an early PoC here https://github.com/xitongsys/parquet-go/pull/448. It is currently not usable as the go bloom library is using a not allowed hashing function by the parquet spec, but I would definitely appreciate feedback :-)

johanneswuerbach avatar Mar 11 '22 21:03 johanneswuerbach