Jim Apple
Jim Apple
Looking more carefully at the code, I don't think it supports filters larger than 2^32 bytes. I'll take this as a feature request.
When I run the following, the correct amount of memory is allocated on my machine: `BlockFilter::CreateWithNdvFpp(1ul
I had to do something similar, but even during `./bootstrap.sh` and with the file locations specified in the apparmor files as those in the directory I was building in.
> Another thing that might be considered is that in particular for testing if sets are disjoint is considering using partitioned bloom filters as described in these papers [Understanding Bloom...
> . . .performance stays more or less constant (close to SOL GUPS throughput) up to the point where the `window_size` exceeds the size of a single sector. Would it...
> Is this design valid or can it be improved? One change I might make: To put `extent_` in a _policy_ seems unusual to me. I'd prefer it as a...
> > Another question: what is the pattern_bits parameter in parquet_filter_policy::pattern_word()? > > That's the bit cardinality of a key's signature aka the number of bits set in a block....
LGTM! :+1:
Hm, I don't see anything in the history in https://github.com/lancedb/lance/commits/main/python/src/scanner.rs that looks related. Maybe an Arrow change? I'll find someone to take a look.
Thanks for the feature request, @npip99! I'll find someone to take a look.