datafusion
datafusion copied to clipboard
feat: support for AnyExpression
Which issue does this PR close?
https://github.com/apache/arrow-datafusion/issues/2548
Rationale for this change
This PR implements partial support for ANY
operator, only for =
& <>
.
Are there any user-facing changes?
This PR doesn't introduce any breaking changes, only new functionality
I added the api change
label since this introduces a new AnyExpr
variant in the Expr
enum, which is a breaking change.
@alamb @andygrove is it required to add a new expression for Ballista? I did that in 218a917, but I don't use it. Thanks
@alamb @andygrove is it required to add a new expression for Ballista? I did that in 218a917, but I don't use it. Thanks
I assume you had to add that to fix compilation issues because ballista has exhaustive matching of all expressions? The changes look reasonable to me.
Note that we also have a datafusion.proto
(in fact, we have two copies, as noted in https://github.com/apache/arrow-datafusion/issues/2514) that we should ideally also add new expressions to (but we have not been doing this consistently).
I plan to review this carefully today -- thanks @ovr
@ovr could you merge latest from master into this branch to pick up https://github.com/apache/arrow-datafusion/pull/2567
Thanks -- I'll try and find some more time to review this PR.
On Wed, May 18, 2022 at 5:54 PM Dmitry Patsura @.***> wrote:
@.**** commented on this pull request.
In datafusion/physical-expr/src/expressions/any.rs https://github.com/apache/arrow-datafusion/pull/2549#discussion_r876413237 :
- fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
let value = match self.value.evaluate(batch)? {
ColumnarValue::Array(array) => array,
ColumnarValue::Scalar(scalar) => scalar.to_array(),
};
I haven't reviewed this code super carefully but I wonder what you think about reusing more of the code for InList?
Probably, the nearest operator will be ALL, but it's not implemented right now. I tried to reuse the code of InList, but come to the decision that it will be easier/correct to combine it with ANY in the future.
It seems like the implementations of x IN (, ) and x ANY () are almost exactly the same except for the fact that the the x = ANY() has the list as a single ?
Nope, because it supports more operators: <, <=, >, >= (it's not implemented in this PR, but anyway).
I mention this as it is likely the IN / NOT IN are more optimized (use a hashset for testing, for example) as well as already written.
Yes, but IN supports only the vector of scalars in DF. It's not the same as ANY, because it supports List (real column) or scalar.
For example SELECT left, values FROM table:
left | right LIst(Int64) - another logic is required
| [1,2,3] - ArrayRef = PrimitiveArray<Int64> - another logic is requied
|. [2,3,4]
|. [4,5,6]
| [4,5,6]
|. [4,5,6]
In the case of using the whole column, it is required to use another handling of interacting on values ☝️ . Probably here, will be more correct to go by steps:
- Support more operators
- Introduce ALL operator + extract the same logic with ANY.
Thanks
— Reply to this email directly, view it on GitHub https://github.com/apache/arrow-datafusion/pull/2549#discussion_r876413237, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADXZMLLPQHWVVSZ2KIMKULVKVRKVANCNFSM5WCC6J2Q . You are receiving this because you were mentioned.Message ID: @.***>
Codecov Report
Merging #2549 (2d32cb2) into master (4b3eb1c) will decrease coverage by
0.09%
. The diff coverage is69.00%
.
@@ Coverage Diff @@
## master #2549 +/- ##
==========================================
- Coverage 84.64% 84.54% -0.10%
==========================================
Files 267 268 +1
Lines 46926 47239 +313
==========================================
+ Hits 39719 39937 +218
- Misses 7207 7302 +95
Impacted Files | Coverage Δ | |
---|---|---|
datafusion/core/src/datasource/listing/helpers.rs | 95.34% <ø> (ø) |
|
...ion/core/src/optimizer/common_subexpr_eliminate.rs | 87.81% <0.00%> (-0.74%) |
:arrow_down: |
...afusion/core/src/optimizer/simplify_expressions.rs | 81.90% <0.00%> (-0.10%) |
:arrow_down: |
datafusion/core/src/optimizer/utils.rs | 31.77% <0.00%> (-0.49%) |
:arrow_down: |
datafusion/expr/src/utils.rs | 91.79% <ø> (ø) |
|
datafusion/physical-expr/src/expressions/mod.rs | 100.00% <ø> (ø) |
|
datafusion/proto/src/from_proto.rs | 33.94% <0.00%> (-0.16%) |
:arrow_down: |
datafusion/proto/src/to_proto.rs | 55.74% <0.00%> (-0.59%) |
:arrow_down: |
datafusion/sql/src/utils.rs | 50.19% <0.00%> (-0.79%) |
:arrow_down: |
datafusion/expr/src/expr_schema.rs | 66.88% <16.66%> (-2.04%) |
:arrow_down: |
... and 11 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 4b3eb1c...2d32cb2. Read the comment docs.
Rebased ✅ cC @alamb @tustvold
I reviewed the logical plan, SQL parsing, and optimizer changes and LGTM. I did not review the execution part.
It might make reviews easier for future work like this to have separate PRs for logical versus physical parts. I would have been more comfortable reviewing and approving the logical plan parts of this.
Hi @ovr What do you think about breaking this up into smaller PRs and starting with the logical plan changes and SQL parsing? I think this might make it easier to start getting parts of this merged in.
This PR is more than 6 month old, so closing it down for now to clean up the PR list. Please reopen if this is a mistake and you plan to work on it more