presto
presto copied to clipboard
[Iceberg]Support non-identity partition predicate for thoroughly push down
Description
This PR extends Iceberg capabilities to pushdown thoroughly predicates on partitioning columns. Besides pushdown thoroughly predicates on identity partitioning columns, it also pushdown thoroughly predicates if they align with partitioning boundaries.
It is also helpful for DELETE, as it allows to do metadata-only delete in more cases, where we don't really needing to do a row-level delete.
Test Plan
- Added test cases for pushing down predicates on columns of various partition transforms
Contributor checklist
- [ ] Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
- [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality.
- [ ] If release notes are required, they follow the release notes guidelines.
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.
Release Notes
== NO RELEASE NOTE ==
Convert to draft as this PR depending on #21959. So it should be reviewed after #21959 is merged.
Is this a cherry pick? At first glance, I don't see substantial similarities with the referenced PR.
Is this a cherry pick? At first glance, I don't see substantial similarities with the referenced PR.
I referenced the implementation of trino, which may evolve gradually through a series of PRs. Is it suitable to be described as a "cherry-pick"?
Based on my reading of the code, this wouldn't qualify as a cherry pick as it's substantially different, and even the way it's performed is different between the two systems. I think we can remove the cherry pick reference.
Based on my reading of the code, this wouldn't qualify as a cherry pick as it's substantially different, and even the way it's performed is different between the two systems. I think we can remove the cherry pick reference.
OK, fixed! Thanks for the suggestion.
Instead of making changes directly to
Type
, can we just have a new utility method in the Iceberg connector which takes in the Type, the Object, and from those two things infers the primtive type of the Object from the SQL Type? The helper methods on Type don't directly relate to Block usage, which is what we expect for Types.
Very reasonable, I agree that we should not modify common Type
just for some specific uses.
Should we move the method getPreviousValue
and getNextValue
to a utility class as well? And should we place the utility methods in iceberg connector or common TypeUtils
? If we place them in iceberg connector, then should we just need to support the types needed by iceberg's truncate/year/month/day/hour transform?
Instead of making changes directly to
Type
, can we just have a new utility method in the Iceberg connector which takes in the Type, the Object, and from those two things infers the primtive type of the Object from the SQL Type? The helper methods on Type don't directly relate to Block usage, which is what we expect for Types.Very reasonable, I agree that we should not modify common
Type
just for some specific uses.Should we move the method
getPreviousValue
andgetNextValue
to a utility class as well? And should we place the utility methods in iceberg connector or commonTypeUtils
? If we place them in iceberg connector, then should we just need to support the types needed by iceberg's truncate/year/month/day/hour transform?
I think we can start in the Iceberg connector, and extract if or when an additional use case comes along.
I think we can start in the Iceberg connector, and extract if or when an additional use case comes along.
Good idea! Have done this, please take a look, thanks a lot!
Hi @ZacBlanco, I have combined getPreviousValue()
and getNextValue()
to a single function getAdjacentValue()
, and used anyNot(...)
to extract functions thoroughlyPushdown(...)
and notThoroughlyPushdown(...)
which could have a more clearer meaning. Then refactored test cases based on these functions, and moved the left-unwrap TODOs to separate @Ignore
test functions. Please take a look when convenient, thanks!
@tdcmeehan @ZacBlanco just resolved the conflict in IcebergPlanOptimizer
.
Don't forget to squash your commits
Don't forget to squash your commits
OK, done!