datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

remove type coercion in the binary physical expr

Open liukun4515 opened this issue 2 years ago • 7 comments

Which issue does this PR close?

Closes https://github.com/apache/arrow-datafusion/issues/3388

depend on https://github.com/apache/arrow-datafusion/issues/3421 https://github.com/apache/arrow-datafusion/issues/3289 https://github.com/apache/arrow-datafusion/pull/3459 https://github.com/apache/arrow-datafusion/pull/3472

fixed the row filter predication for the data type of NULL value https://github.com/apache/arrow-datafusion/pull/3470

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

liukun4515 avatar Sep 08 '22 12:09 liukun4515

after rebase https://github.com/apache/arrow-datafusion/pull/3380, the test case evolved_schema_intersection_filter_with_filter_pushdown and evolved_schema_disjoint_schema_filter_with_pushdown failed

liukun4515 avatar Sep 13 '22 22:09 liukun4515

this pr depend on the https://github.com/apache/arrow-datafusion/pull/3470 and https://github.com/apache/arrow-datafusion/pull/3472

liukun4515 avatar Sep 14 '22 07:09 liukun4515

I meet so many issue when remove the binary type coercion which is caused from these pr https://github.com/apache/arrow-datafusion/pull/3301 https://github.com/apache/arrow-datafusion/pull/3246 For example is TRUE, these is a logical expr Expr::IsTrue but convert it to the physical binary op in https://github.com/andygrove/arrow-datafusion/blob/92b1a341aca334c54235ef65737c88037010e30b/datafusion/physical-expr/src/planner.rs#L86

@andygrove why not generate the logical binary op when create the Expr::IsTrue logical expr.

liukun4515 avatar Sep 16 '22 06:09 liukun4515

I meet so many issue when remove the binary type coercion which is caused from these pr #3301 #3246 For example is TRUE, these is a logical expr Expr::IsTrue but convert it to the physical binary op in https://github.com/andygrove/arrow-datafusion/blob/92b1a341aca334c54235ef65737c88037010e30b/datafusion/physical-expr/src/planner.rs#L86

@andygrove why not generate the logical binary op when create the Expr::IsTrue logical expr.

cc @alamb

I create the issue https://github.com/apache/arrow-datafusion/issues/3509 to support other logical expr for the typ coercion. If we can convert such expr like istrue, isfalse,like to binary op in the logical phase, we don't need to do this by adding more case in the TypeCoercion optimizer rule. I don't know why we don't do this when implement the is true, is false in the https://github.com/apache/arrow-datafusion/pull/3301/files @andygrove , is there any optimization should be done in the logical phase with the Expr:IsTrue?

liukun4515 avatar Sep 16 '22 06:09 liukun4515

is there any optimization should be done in the logical phase with the Expr:IsTrue?

I think the only coercion it might need is that its argument is DataType::Boolean

alamb avatar Sep 17 '22 10:09 alamb

Codecov Report

Merging #3396 (3062cd5) into master (b54a56f) will increase coverage by 0.05%. The diff coverage is 98.52%.

@@            Coverage Diff             @@
##           master    #3396      +/-   ##
==========================================
+ Coverage   86.03%   86.08%   +0.05%     
==========================================
  Files         300      300              
  Lines       56253    56313      +60     
==========================================
+ Hits        48395    48477      +82     
+ Misses       7858     7836      -22     
Impacted Files Coverage Δ
datafusion/core/src/physical_plan/planner.rs 78.03% <ø> (+0.56%) :arrow_up:
datafusion/core/tests/sql/aggregates.rs 99.38% <ø> (+<0.01%) :arrow_up:
datafusion/core/tests/sql/decimal.rs 100.00% <ø> (ø)
datafusion/optimizer/src/type_coercion.rs 96.01% <92.30%> (+3.61%) :arrow_up:
datafusion/physical-expr/src/expressions/binary.rs 97.42% <98.33%> (-0.21%) :arrow_down:
datafusion/core/src/execution/context.rs 79.38% <100.00%> (+0.07%) :arrow_up:
...ore/src/physical_optimizer/aggregate_statistics.rs 100.00% <100.00%> (ø)
...sion/core/src/physical_plan/file_format/parquet.rs 94.68% <100.00%> (+0.02%) :arrow_up:
datafusion/core/tests/sql/predicates.rs 100.00% <100.00%> (ø)
datafusion/core/tests/sql/select.rs 99.78% <100.00%> (ø)
... and 28 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov-commenter avatar Sep 22 '22 06:09 codecov-commenter

I've started reviewing this, but there are many changes and referenced issues I need to go read. I will continue with the review tomorrow.

andygrove avatar Sep 22 '22 21:09 andygrove

I also plan to review this PR, but I may not get to it today

alamb avatar Sep 23 '22 12:09 alamb

I try to explain what I do and why i do in the description What changes are included in this PR?

I hope this works for your review.

cc @andygrove @alamb

liukun4515 avatar Sep 23 '22 12:09 liukun4515

I plan to get this PR ready to merge by: add my suggested comments, fix clippy, and merge to master

alamb avatar Sep 24 '22 10:09 alamb

(since I already have it checked out this will be a simple thing for me)

alamb avatar Sep 24 '22 10:09 alamb

Benchmark runs are scheduled for baseline = b6252771248da117dd3a5976d134a66a4bb60431 and contender = d7c0e420ce6362a253e1f55a873a0732052e0e2f. d7c0e420ce6362a253e1f55a873a0732052e0e2f is a master commit associated with this PR. Results will be available as each benchmark for each run completes. Conbench compare runs links: [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2 [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q Buildkite builds: Supported benchmarks: ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True test-mac-arm: Supported benchmark langs: C++, Python, R ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

ursabot avatar Sep 24 '22 12:09 ursabot