iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

API, Core: Add scan planning apis to REST Catalog

Open rahil-c opened this issue 1 year ago • 4 comments

This pr adds the new scan planning apis based on the recent REST spec changes made in the following https://github.com/apache/iceberg/pull/9695.

In addition there is an additional spec change to loadTable that https://github.com/apache/iceberg/pull/11156/files that allows for the service to notify the client when to call these new apis.

cc @amogh-jahagirdar @nastra @rdblue @danielcweeks @jackye1995 @singhpk234 @geruh

rahil-c avatar Sep 20 '24 23:09 rahil-c

@rahil-c can you also please add respective JSON parser(s) with tests for those parser(s) as those request/response classes won't be serialized/deserialized

nastra avatar Sep 23 '24 12:09 nastra

Thanks @rahil-c for kicking off the implementation! I wonder what others think but IMO this can be combined with serializer implementations that get registered in RESTSerializers. Then we can have more robust tests which exercise all the validation logic in the model.

can you also please add respective JSON parser(s) with tests for those parser(s) as those request/response classes won't be serialized/deserialized

Thanks @amogh-jahagirdar and @nastra for revieiwing, will follow up on this pr with serializers, parsers, and the tests for them.

rahil-c avatar Sep 23 '24 15:09 rahil-c

@rdblue @danielcweeks @amogh-jahagirdar @nastra @jackye1995 @singhpk234 When implementing the parsers, one thing that I noticed was that not having a partition-spec sent back from the service during the time of several response serialization/deserialization causes several issues from client implementation standpoint.

I have marked these areas with TODO comments so its easier for reviewers to see the complexities behind this and would recommend that we have a discussion around the pros/cons of the service sending back a parition-spec in the FileScanTask model.

The spec-id from the DataFile alone is not enough to find the correct the partition-spec because we do not have the TableMetadata at the time of serialization/deserialization to obtain the relevant partition spec.

rahil-c avatar Oct 09 '24 23:10 rahil-c

@rdblue @danielcweeks @amogh-jahagirdar @nastra @jackye1995 @singhpk234

Added a new commit Add support for scan planning apis in REST Catalog which invokes the new apis and introduces some new classes regarding a RESTTable and RESTTableScan. Let me know what you guys think and will try to address the recent feedback from @amogh-jahagirdar (thank you for the consistent reviews!)

Finally once we have worked most of the major implementation bits will look into adding tests on this pr. Would also appreciate if people can take a look at this spec pr as well https://github.com/apache/iceberg/pull/11156/files regarding a change to loadTable for planning-mode.

rahil-c avatar Oct 11 '24 07:10 rahil-c

Please see the following PR: https://github.com/apache/iceberg/pull/11369 which just contains the changes only for the model and parsers.

rahil-c avatar Oct 21 '24 18:10 rahil-c

@amogh-jahagirdar @rdblue @danielcweeks @nastra Pushed the following commit Utilize ParallelIterable, and use table prop instead of rest planning, which addressed some of the planning logic feedback.

Moving on adding to adding tests to check RESTTableScan.planFiles() logic.

cc @singhpk234 @jackye1995

rahil-c avatar Oct 22 '24 23:10 rahil-c

It looks like this won't be ready this week so I'll move it out to the next release milestone.

RussellSpitzer avatar Oct 28 '24 17:10 RussellSpitzer

@rdblue @danielcweeks Was wondering if you guys can take a look whenever you get a chance?

rahil-c avatar Dec 10 '24 20:12 rahil-c

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Jan 10 '25 00:01 github-actions[bot]

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

Commenting to avoid this being marked as stale. cc @rdblue @danielcweeks

rahil-c avatar Jan 10 '25 00:01 rahil-c

superceded by https://github.com/apache/iceberg/pull/11180

singhpk234 avatar May 08 '25 03:05 singhpk234