flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-35098][ORC] Fix incorrect results with literal first expressions

Open empathy87 opened this issue 1 year ago • 7 comments

What is the purpose of the change

Fixes an issue with SQL queries containing expressions like 10 >= y when using Filesystem connector and ORC format.

Brief change log

  • In OrcFilters fixed filters push down of GREATER_THAN, GREATER_THAN_OR_EQUAL, LESS_THAN, LESS_THAN_OR_EQUAL functions

Verifying this change

Added tests:

  • OrcFileSystemITCase::testOrcFilterPushDownLiteralFirst verifies a query result.
  • OrcFileSystemFilterTest::testApplyPredicateReverse verifies expression conversions to ORC filter predicates.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no

empathy87 avatar Apr 14 '24 03:04 empathy87

CI report:

  • da15efe3712509d106abe02549c79356ea0f3ada Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Apr 14 '24 03:04 flinkbot

Hi @empathy87 thanks a lot for driving this PR. It looks good to me. Could you please add tests for each changelog in 'OrcFilters.java' ? Thanks!

@jeyhunkarimov Sure! I reworked the tests to cover all changes in OrcFilters.java.

Please, note that expressions like 10 > y could not be tested by executing SQL queries. This is because Filesystem connector predicate push down works in a best effort manner. And if extra rows are returned from the ORC file, they will be further filtered out.

Expressions like 10 > y are tested in OrcFileSystemFilterTest::testApplyPredicateReverse to assert that it is converted to y < 10 ORC filer predicate.

empathy87 avatar Apr 14 '24 15:04 empathy87

Thanks for the contribution @empathy87 thanks for the review @jeyhunkarimov It looks ok to me I left a minor comment

snuyanzin avatar May 10 '24 07:05 snuyanzin

Thanks, let's wait for ci

btw, @empathy87 could you please create backport PRs with this fix for 1.19 and 1.18?

snuyanzin avatar May 10 '24 09:05 snuyanzin

Thanks, let's wait for ci

btw, @empathy87 could you please create backport PRs with this fix for 1.19 and 1.18?

Yeh, sure, I will do backports!

empathy87 avatar May 10 '24 12:05 empathy87

@empathy87 please let me know once backports are ready, then we can merge all of them

snuyanzin avatar May 13 '24 07:05 snuyanzin

@empathy87 please let me know once backports are ready, then we can merge all of them

@snuyanzin, backport PRs are done, CI succeeded. https://github.com/apache/flink/pull/24778 https://github.com/apache/flink/pull/24779

empathy87 avatar May 13 '24 16:05 empathy87