kyuubi icon indicating copy to clipboard operation
kyuubi copied to clipboard

enable MaxScanStrategy when accessing iceberg datasource

Open zhaohehuhu opened this issue 1 year ago โ€ข 3 comments

:mag: Description

Issue References ๐Ÿ”—

Now, MaxScanStrategy can be adopted to limit max scan file size/max scan partitions in some datasources, such as Hive. Hopefully we can enhance MaxScanStrategy to include support for the iceberg datasource.

Describe Your Solution ๐Ÿ”ง

get the statistics about files and partitions scanned from iceberg datasourcev2 API

Types of changes :bookmark:

  • [ ] Bugfix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)

Test Plan ๐Ÿงช

Behavior Without This Pull Request :coffin:

Behavior With This Pull Request :tada:

Related Unit Tests


Checklists

๐Ÿ“ Author Self Checklist

  • [ ] My code follows the style guidelines of this project
  • [ ] I have performed a self-review
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] My changes generate no new warnings
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] New and existing unit tests pass locally with my changes
  • [ ] This patch was not authored or co-authored using Generative Tooling

๐Ÿ“ Committer Pre-Merge Checklist

  • [ ] Pull request title is okay.
  • [ ] No license issues.
  • [ ] Milestone correctly set?
  • [ ] Test coverage is ok
  • [ ] Assignees are selected.
  • [ ] Minimum number of approvals
  • [ ] No changes are requested

Be nice. Be informative.

zhaohehuhu avatar Dec 13 '23 06:12 zhaohehuhu

Please make sure that the Kyuubi Spark extension also works well on iceberg-free Spark runtime.

pan3793 avatar Dec 14 '23 11:12 pan3793

Please make sure that the Kyuubi Spark extension also works well on iceberg-free Spark runtime.

good point. Thanks

zhaohehuhu avatar Dec 15 '23 03:12 zhaohehuhu

Please make sure that the Kyuubi Spark extension also works well on iceberg-free Spark runtime.

Fixed. Plz review again.

zhaohehuhu avatar Dec 15 '23 03:12 zhaohehuhu

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 58.40%. Comparing base (67f099a) to head (3c5b0c2). Report is 23 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #5852      +/-   ##
============================================
- Coverage     58.58%   58.40%   -0.19%     
  Complexity       24       24              
============================================
  Files           649      651       +2     
  Lines         39379    39513     +134     
  Branches       5415     5441      +26     
============================================
+ Hits          23070    23076       +6     
- Misses        13841    13955     +114     
- Partials       2468     2482      +14     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Mar 14 '24 13:03 codecov-commenter

@zhaohehuhu Could you add a unit test?

wForget avatar Mar 15 '24 07:03 wForget

@zhaohehuhu Could you add a unit test?

Sure. I will add it. Thanks!

zhaohehuhu avatar Mar 15 '24 07:03 zhaohehuhu

Thanks @wForget @pan3793

zhaohehuhu avatar Mar 22 '24 09:03 zhaohehuhu

disable the rule that checks the maxPartitions for dsv2 @wForget

zhaohehuhu avatar Apr 07 '24 04:04 zhaohehuhu

Thanks, merged to master/1.9

pan3793 avatar Apr 17 '24 08:04 pan3793