arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[C++][Compute] "Scatter" vector functions

Open zanmato1984 opened this issue 1 year ago • 0 comments

Describe the enhancement requested

We discussed the solution for #41094 , the conclusion is that the "special form" is the way. Comment https://github.com/apache/arrow/issues/41094#issuecomment-2087716483 gives a thorough description of how special forms work.

Here I summarize a bit: a special form "mask-ably" evaluates some of its subexpressions based on some masks obtained from its other subexpressions. For example consider if cond then expr1 else expr2, the result of cond is the mask, which controls which rows goes to expr1 and which goes to expr2. Another example is logical and/or, each of its subexpressions is part of the mask to evaluate the rest subexpressions (boolean short-circuit).

One way to implement special forms is that every expression selectively executes its kernel by respecting a selection vector (which rows this kernel should execute on) or a equally boolean mask. But unfortunately this isn't practical because we can't afford to change every (scalar) compute functions to support selection vector/mask all at once. So we must take an adaptive way, allowing functions to be selection vector/mask agnostic. To do so, a special form should 1) takes rows specific to each branch; 2) invoke the function of each branch on each group of these rows; 3) combine the results of all the branches by scattering each row to its original position in the input.

So far we have vector function filter/take to do 1), but there isn't a handy utility to do 3).

Component(s)

C++

zanmato1984 avatar Oct 13 '24 16:10 zanmato1984