parquet-go
parquet-go copied to clipboard
Bloom filter support?
Hi,
I was curious if you have any intent on supporting bloom filters:
https://github.com/apache/parquet-format/blob/master/BloomFilter.md
thanks!
hi, @celer I need some time to investigate it.
I would also be interested in this and if I understand correctly the schema embedded here already supports adding bloom headers https://github.com/xitongsys/parquet-go/blob/master/parquet/parquet.go#L6012, but I struggle to understand how to store the header in a parquet file. Any pointers here?
Otherwise there is a fairly popular go bloom library already available https://github.com/bits-and-blooms/bloom, which could be added to support this.
I've created an early PoC here https://github.com/xitongsys/parquet-go/pull/448. It is currently not usable as the go bloom library is using a not allowed hashing function by the parquet spec, but I would definitely appreciate feedback :-)