substrait icon indicating copy to clipboard operation
substrait copied to clipboard

Should we add an `is_in` function?

Open westonpace opened this issue 1 year ago • 5 comments

Currently, in Acero, we've been mapping is_in to SingularOrList. The latter is more generic, and so this is safe, but it doesn't round trip well and is_in is more efficient.

In other words, given something like:

is_in(f0, [7, 3, 4, 6]) we round trip to f0 == 7 || f0 == 3 || f0 == 4 || f0 == 6.

It is possible to recognize that an or-list collapses to is_in but then everyone needs to repeat this optimization and there can be some tricky nuance in case someone provides something like f0 == 7 || 3 == f0 (f0 on both left and right is ok but can confuse a simplistic optimizer routine).

It seems like having is_in is the same thing as having both JoinRel and HashJoinRel. It's just that we're not usually used to seeing the logical/physical relationship play out across expressions in addition to relations.

westonpace avatar Jul 07 '23 04:07 westonpace