presto icon indicating copy to clipboard operation
presto copied to clipboard

[Iceberg]Support non-identity partition predicate for thoroughly push down

Open hantangwangd opened this issue 1 year ago • 5 comments

Description

This PR extends Iceberg capabilities to pushdown thoroughly predicates on partitioning columns. Besides pushdown thoroughly predicates on identity partitioning columns, it also pushdown thoroughly predicates if they align with partitioning boundaries.

It is also helpful for DELETE, as it allows to do metadata-only delete in more cases, where we don't really needing to do a row-level delete.

Test Plan

  • Added test cases for pushing down predicates on columns of various partition transforms

Contributor checklist

  • [ ] Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • [ ] If release notes are required, they follow the release notes guidelines.
  • [ ] Adequate tests were added if applicable.
  • [ ] CI passed.

Release Notes

== NO RELEASE NOTE ==

hantangwangd avatar Feb 24 '24 17:02 hantangwangd

Convert to draft as this PR depending on #21959. So it should be reviewed after #21959 is merged.

hantangwangd avatar Feb 24 '24 18:02 hantangwangd

Is this a cherry pick? At first glance, I don't see substantial similarities with the referenced PR.

tdcmeehan avatar Feb 26 '24 15:02 tdcmeehan

Is this a cherry pick? At first glance, I don't see substantial similarities with the referenced PR.

I referenced the implementation of trino, which may evolve gradually through a series of PRs. Is it suitable to be described as a "cherry-pick"?

hantangwangd avatar Feb 26 '24 16:02 hantangwangd

Based on my reading of the code, this wouldn't qualify as a cherry pick as it's substantially different, and even the way it's performed is different between the two systems. I think we can remove the cherry pick reference.

tdcmeehan avatar Feb 26 '24 17:02 tdcmeehan

Based on my reading of the code, this wouldn't qualify as a cherry pick as it's substantially different, and even the way it's performed is different between the two systems. I think we can remove the cherry pick reference.

OK, fixed! Thanks for the suggestion.

hantangwangd avatar Feb 26 '24 23:02 hantangwangd

Instead of making changes directly to Type, can we just have a new utility method in the Iceberg connector which takes in the Type, the Object, and from those two things infers the primtive type of the Object from the SQL Type? The helper methods on Type don't directly relate to Block usage, which is what we expect for Types.

Very reasonable, I agree that we should not modify common Type just for some specific uses.

Should we move the method getPreviousValue and getNextValue to a utility class as well? And should we place the utility methods in iceberg connector or common TypeUtils? If we place them in iceberg connector, then should we just need to support the types needed by iceberg's truncate/year/month/day/hour transform?

hantangwangd avatar Mar 06 '24 23:03 hantangwangd

Instead of making changes directly to Type, can we just have a new utility method in the Iceberg connector which takes in the Type, the Object, and from those two things infers the primtive type of the Object from the SQL Type? The helper methods on Type don't directly relate to Block usage, which is what we expect for Types.

Very reasonable, I agree that we should not modify common Type just for some specific uses.

Should we move the method getPreviousValue and getNextValue to a utility class as well? And should we place the utility methods in iceberg connector or common TypeUtils? If we place them in iceberg connector, then should we just need to support the types needed by iceberg's truncate/year/month/day/hour transform?

I think we can start in the Iceberg connector, and extract if or when an additional use case comes along.

tdcmeehan avatar Mar 08 '24 16:03 tdcmeehan

I think we can start in the Iceberg connector, and extract if or when an additional use case comes along.

Good idea! Have done this, please take a look, thanks a lot!

hantangwangd avatar Mar 08 '24 18:03 hantangwangd

Hi @ZacBlanco, I have combined getPreviousValue() and getNextValue() to a single function getAdjacentValue(), and used anyNot(...) to extract functions thoroughlyPushdown(...) and notThoroughlyPushdown(...) which could have a more clearer meaning. Then refactored test cases based on these functions, and moved the left-unwrap TODOs to separate @Ignore test functions. Please take a look when convenient, thanks!

hantangwangd avatar Mar 20 '24 04:03 hantangwangd

@tdcmeehan @ZacBlanco just resolved the conflict in IcebergPlanOptimizer.

hantangwangd avatar Apr 12 '24 02:04 hantangwangd

Don't forget to squash your commits

ZacBlanco avatar Apr 12 '24 15:04 ZacBlanco

Don't forget to squash your commits

OK, done!

hantangwangd avatar Apr 12 '24 18:04 hantangwangd