API, Core: Add scan planning apis to REST Catalog
This pr adds the new scan planning apis based on the recent REST spec changes made in the following https://github.com/apache/iceberg/pull/9695.
In addition there is an additional spec change to loadTable that https://github.com/apache/iceberg/pull/11156/files that allows for the service to notify the client when to call these new apis.
cc @amogh-jahagirdar @nastra @rdblue @danielcweeks @jackye1995 @singhpk234 @geruh
@rahil-c can you also please add respective JSON parser(s) with tests for those parser(s) as those request/response classes won't be serialized/deserialized
Thanks @rahil-c for kicking off the implementation! I wonder what others think but IMO this can be combined with serializer implementations that get registered in
RESTSerializers. Then we can have more robust tests which exercise all the validation logic in the model.
can you also please add respective JSON parser(s) with tests for those parser(s) as those request/response classes won't be serialized/deserialized
Thanks @amogh-jahagirdar and @nastra for revieiwing, will follow up on this pr with serializers, parsers, and the tests for them.
@rdblue @danielcweeks @amogh-jahagirdar @nastra @jackye1995 @singhpk234 When implementing the parsers, one thing that I noticed was that not having a partition-spec sent back from the service during the time of several response serialization/deserialization causes several issues from client implementation standpoint.
I have marked these areas with TODO comments so its easier for reviewers to see the complexities behind this and would recommend that we have a discussion around the pros/cons of the service sending back a parition-spec in the FileScanTask model.
The spec-id from the DataFile alone is not enough to find the correct the partition-spec because we do not have the TableMetadata at the time of serialization/deserialization to obtain the relevant partition spec.
@rdblue @danielcweeks @amogh-jahagirdar @nastra @jackye1995 @singhpk234
Added a new commit Add support for scan planning apis in REST Catalog which invokes the new apis and introduces some new classes regarding a RESTTable and RESTTableScan. Let me know what you guys think and will try to address the recent feedback from @amogh-jahagirdar (thank you for the consistent reviews!)
Finally once we have worked most of the major implementation bits will look into adding tests on this pr. Would also appreciate if people can take a look at this spec pr as well https://github.com/apache/iceberg/pull/11156/files regarding a change to loadTable for planning-mode.
Please see the following PR: https://github.com/apache/iceberg/pull/11369 which just contains the changes only for the model and parsers.
@amogh-jahagirdar @rdblue @danielcweeks @nastra
Pushed the following commit Utilize ParallelIterable, and use table prop instead of rest planning, which addressed some of the planning logic feedback.
Moving on adding to adding tests to check RESTTableScan.planFiles() logic.
cc @singhpk234 @jackye1995
It looks like this won't be ready this week so I'll move it out to the next release milestone.
@rdblue @danielcweeks Was wondering if you guys can take a look whenever you get a chance?
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.
Commenting to avoid this being marked as stale. cc @rdblue @danielcweeks
superceded by https://github.com/apache/iceberg/pull/11180