hudi
hudi copied to clipboard
[HUDI-6129] Support rate limit for Spark streaming source
Change Logs
- Add
InstantOffsetto allow read instant partially - Add
InstantOffsetRangeto support filtering files btw startOffset(exclusive) and endOffset - Spark3 implements
SupportsAdmissionControlto support rate limit based onmaxFilesPerTriggerandmaxRowsPerTrigger - Move
HoodieStreamSourceto Spark2 package, and addHoodieSpark3StreamSourcefor spark3+ - Refactor
IncrementalRelationandMergeOnReadIncrementalRelationto filter files fromInstantOffsetRange
Impact
Describe any public API or user-facing feature change or any performance impact. None, compatible with before
Risk level (write none, low medium or high below)
If medium or high, explain what verification was done to mitigate the risks. none
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
- The config description must be updated if new configs are added or the default value of the configs are changed
- Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the instruction to make changes to the website.
Contributor's checklist
- [ ] Read through contributor's guide
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
@yihua @danny0405 this is a new implement for https://github.com/apache/hudi/pull/8796. I'm still working on the relevant tests, but not sure whether this implementation is suitable, so put it up whereas you guys can share thoughts in advance. Thanks
CI report:
- 11b712bb0ee49dc663bdba8217fb6e84efbfed92 Azure: FAILURE
Bot commands
@hudi-bot supports the following commands:@hudi-bot run azurere-run the last Azure build
Gentle ping @danny0405 @yihua @xushiyan is it suitable to allow partial commits reading? Any more thoughts abt this feature. We can see both delta and iceberg support maxFilePerTrigger and maxRowsPerTrigger, thinking it will be good adding this in HUDI as well.