[C++] Implement support for using selection vectors in scalar aggregate function kernels
The aggregate kernels don't support mask (the result of a filter). Add the the following method to AggregateFunction.
virtual Status ConsumeWithFilter(const Array& input, const Array& mask, void* state) const = 0;
The goal is to add support for AST similar to:
SELECT AGG(x) FROM table WHERE pred;
Reporter: Francois Saint-Jacques / @fsaintjacques
Related issues:
Note: This issue was originally created as ARROW-5005. Please see the migration documentation for further details.
Ben Kietzman / @bkietz:
Should this be added to UnaryKernel/BinaryKernel as CallMasked or so? It seems like it might be generally useful and fairly straightforward to provide a naive default impl
Francois Saint-Jacques / @fsaintjacques: The default implementation could work for some operations by using Take, but it's preferable to have each aggregate provide a native implementation. Note that masking is not just applicable to aggregates.
Wes McKinney / @wesm: @bkietz the use of filters to improve performance is fairly specific to each class of kernels – it is especially impactful in the case of aggregates. I do not think we should generalize prematurely in this case
Wes McKinney / @wesm:
I believe the best approach right now is to use selection vectors for this (see arrow::compute::ExecBatch::selection_vector)
Niranda Perera / @nirandaperera:
I will take this up. From what I understand, we need to enable masked scalar aggregates using the ExecBatch::selection_vector.
I have a couple of questions.
- Where/ at what point would the selection_vector be set to an ExecBatch?
- Should I change the kernels in aggregate_* files, so that they are selection-vector-aware, or should we manage those as separate implementations?
Todd Farmer / @toddfarmer: This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.
This issue has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label or comment below, or this issue will be closed in 14 days. If this improvement is still desired but has no current owner, please add the 'Status: needs champion' label.