datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

Add support for bloom_filter_agg

Open andygrove opened this issue 1 year ago • 6 comments

What is the problem the feature request solves?

Some TPC-H queries use bloom_filter_agg, and Comet does not have a native implementation yet.

A workaround is to set spark.sql.optimizer.runtime.bloomFilter.enabled=false.

Describe the potential solution

No response

Additional context

No response

andygrove avatar Aug 19 '24 16:08 andygrove

I can take this up if more details are provided :)

vaibhawvipul avatar Aug 24 '24 07:08 vaibhawvipul

I can take this up if more details are provided :)

We need to implement an equivalent of Spark's org.apache.spark.sql.catalyst.expressions.aggregate.BloomFilterAggregate.

andygrove avatar Aug 24 '24 11:08 andygrove

ok, taking this up.

vaibhawvipul avatar Aug 25 '24 04:08 vaibhawvipul

Accumulating some notes for this. Here's the Spark design doc on the feature: https://docs.google.com/document/d/16IEuyLeQlubQkH8YuVuXWKo2-grVIoDJqQpHZrE7q04/

mbutrovich avatar Sep 24 '24 16:09 mbutrovich

It looks like we already have the filter support thanks to https://github.com/apache/datafusion-comet/pull/179.

mbutrovich avatar Sep 24 '24 21:09 mbutrovich

BloomFilterAggregate is only supported by ObjectHashAggregate or SortAggregate. But in Comet, we only support HashAggregate so far.

viirya avatar Oct 01 '24 18:10 viirya