arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[C++] Implement support for using selection vectors in scalar aggregate function kernels

Open asfimport opened this issue 6 years ago • 7 comments

The aggregate kernels don't support mask (the result of a filter). Add the the following method to AggregateFunction.


virtual Status ConsumeWithFilter(const Array& input, const Array& mask, void* state) const = 0;

The goal is to add support for AST similar to:


SELECT AGG(x) FROM table WHERE pred;

Reporter: Francois Saint-Jacques / @fsaintjacques

Related issues:

Note: This issue was originally created as ARROW-5005. Please see the migration documentation for further details.

asfimport avatar Mar 25 '19 12:03 asfimport

Ben Kietzman / @bkietz: Should this be added to UnaryKernel/BinaryKernel as CallMasked or so? It seems like it might be generally useful and fairly straightforward to provide a naive default impl

asfimport avatar Mar 25 '19 18:03 asfimport

Francois Saint-Jacques / @fsaintjacques: The default implementation could work for some operations by using Take, but it's preferable to have each aggregate provide a native implementation. Note that masking is not just applicable to aggregates.

asfimport avatar Mar 25 '19 18:03 asfimport

Wes McKinney / @wesm: @bkietz the use of filters to improve performance is fairly specific to each class of kernels – it is especially impactful in the case of aggregates. I do not think we should generalize prematurely in this case

asfimport avatar Mar 28 '19 22:03 asfimport

Wes McKinney / @wesm: I believe the best approach right now is to use selection vectors for this (see arrow::compute::ExecBatch::selection_vector)

asfimport avatar May 25 '20 15:05 asfimport

Niranda Perera / @nirandaperera: I will take this up. From what I understand, we need to enable masked scalar aggregates using the ExecBatch::selection_vector.

I have a couple of questions.

  1. Where/ at what point would the selection_vector be set to an ExecBatch?
  2. Should I change the kernels in aggregate_* files, so that they are selection-vector-aware, or should we manage those as separate implementations?

asfimport avatar Jun 08 '21 14:06 asfimport

Todd Farmer / @toddfarmer: This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.

asfimport avatar Jul 12 '22 14:07 asfimport

This issue has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label or comment below, or this issue will be closed in 14 days. If this improvement is still desired but has no current owner, please add the 'Status: needs champion' label.

github-actions[bot] avatar Dec 14 '25 11:12 github-actions[bot]