datafusion
datafusion copied to clipboard
remove type coercion in the binary physical expr
Which issue does this PR close?
Closes https://github.com/apache/arrow-datafusion/issues/3388
depend on https://github.com/apache/arrow-datafusion/issues/3421 https://github.com/apache/arrow-datafusion/issues/3289 https://github.com/apache/arrow-datafusion/pull/3459 https://github.com/apache/arrow-datafusion/pull/3472
fixed the row filter predication for the data type of NULL
value https://github.com/apache/arrow-datafusion/pull/3470
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?
after rebase https://github.com/apache/arrow-datafusion/pull/3380, the test case evolved_schema_intersection_filter_with_filter_pushdown
and evolved_schema_disjoint_schema_filter_with_pushdown
failed
this pr depend on the https://github.com/apache/arrow-datafusion/pull/3470 and https://github.com/apache/arrow-datafusion/pull/3472
I meet so many issue when remove the binary type coercion which is caused from these pr https://github.com/apache/arrow-datafusion/pull/3301 https://github.com/apache/arrow-datafusion/pull/3246
For example is TRUE
, these is a logical expr Expr::IsTrue
but convert it to the physical binary op in https://github.com/andygrove/arrow-datafusion/blob/92b1a341aca334c54235ef65737c88037010e30b/datafusion/physical-expr/src/planner.rs#L86
@andygrove why not generate the logical binary op when create the Expr::IsTrue
logical expr.
I meet so many issue when remove the binary type coercion which is caused from these pr #3301 #3246 For example
is TRUE
, these is a logical exprExpr::IsTrue
but convert it to the physical binary op in https://github.com/andygrove/arrow-datafusion/blob/92b1a341aca334c54235ef65737c88037010e30b/datafusion/physical-expr/src/planner.rs#L86@andygrove why not generate the logical binary op when create the
Expr::IsTrue
logical expr.
cc @alamb
I create the issue https://github.com/apache/arrow-datafusion/issues/3509 to support other logical expr for the typ coercion.
If we can convert such expr like istrue, isfalse,like
to binary op
in the logical phase, we don't need to do this by adding more case in the TypeCoercion optimizer rule.
I don't know why we don't do this when implement the is true, is false
in the https://github.com/apache/arrow-datafusion/pull/3301/files @andygrove , is there any optimization should be done in the logical phase with the Expr:IsTrue
?
is there any optimization should be done in the logical phase with the Expr:IsTrue?
I think the only coercion it might need is that its argument is DataType::Boolean
Codecov Report
Merging #3396 (3062cd5) into master (b54a56f) will increase coverage by
0.05%
. The diff coverage is98.52%
.
@@ Coverage Diff @@
## master #3396 +/- ##
==========================================
+ Coverage 86.03% 86.08% +0.05%
==========================================
Files 300 300
Lines 56253 56313 +60
==========================================
+ Hits 48395 48477 +82
+ Misses 7858 7836 -22
Impacted Files | Coverage Δ | |
---|---|---|
datafusion/core/src/physical_plan/planner.rs | 78.03% <ø> (+0.56%) |
:arrow_up: |
datafusion/core/tests/sql/aggregates.rs | 99.38% <ø> (+<0.01%) |
:arrow_up: |
datafusion/core/tests/sql/decimal.rs | 100.00% <ø> (ø) |
|
datafusion/optimizer/src/type_coercion.rs | 96.01% <92.30%> (+3.61%) |
:arrow_up: |
datafusion/physical-expr/src/expressions/binary.rs | 97.42% <98.33%> (-0.21%) |
:arrow_down: |
datafusion/core/src/execution/context.rs | 79.38% <100.00%> (+0.07%) |
:arrow_up: |
...ore/src/physical_optimizer/aggregate_statistics.rs | 100.00% <100.00%> (ø) |
|
...sion/core/src/physical_plan/file_format/parquet.rs | 94.68% <100.00%> (+0.02%) |
:arrow_up: |
datafusion/core/tests/sql/predicates.rs | 100.00% <100.00%> (ø) |
|
datafusion/core/tests/sql/select.rs | 99.78% <100.00%> (ø) |
|
... and 28 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
I've started reviewing this, but there are many changes and referenced issues I need to go read. I will continue with the review tomorrow.
I also plan to review this PR, but I may not get to it today
I try to explain what I do and why i do in the description What changes are included in this PR?
I hope this works for your review.
cc @andygrove @alamb
I plan to get this PR ready to merge by: add my suggested comments, fix clippy, and merge to master
(since I already have it checked out this will be a simple thing for me)
Benchmark runs are scheduled for baseline = b6252771248da117dd3a5976d134a66a4bb60431 and contender = d7c0e420ce6362a253e1f55a873a0732052e0e2f. d7c0e420ce6362a253e1f55a873a0732052e0e2f is a master commit associated with this PR. Results will be available as each benchmark for each run completes. Conbench compare runs links: [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2 [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q Buildkite builds: Supported benchmarks: ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True test-mac-arm: Supported benchmark langs: C++, Python, R ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java