datafusion-comet
datafusion-comet copied to clipboard
feat: Implement bloom_filter_agg
Which issue does this PR close?
Closes #846.
Rationale for this change
What changes are included in this PR?
- Native implementation (
bloom_filter_agg.rs
) that uses DataFusion'sAccumulator
trait. We do not have aGroupsAccumulator
implementation and leave it as a possible future optimization. - Serde logic (
planner.rs
,QueryPlanSerde.scala
) - Serialization and merging logic for underlying data structures (
spark_bloom_filter.rs
,spark_bit_array.rs
)
How are these changes tested?
- New test in
CometExecSuite
- Spark tests in CI exercise this aggregation
- Scala benchmark to compare against Spark code path
- Native benchmark for partial and final aggregation modes
- Native tests for new bit array logic
spark_bit_array.rs