hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-6129] Support rate limit for Spark streaming source

Open boneanxs opened this issue 1 year ago • 3 comments
trafficstars

Change Logs

  1. Add InstantOffset to allow read instant partially
  2. Add InstantOffsetRange to support filtering files btw startOffset(exclusive) and endOffset
  3. Spark3 implements SupportsAdmissionControl to support rate limit based on maxFilesPerTrigger and maxRowsPerTrigger
  4. Move HoodieStreamSource to Spark2 package, and add HoodieSpark3StreamSource for spark3+
  5. Refactor IncrementalRelation and MergeOnReadIncrementalRelation to filter files from InstantOffsetRange

Impact

Describe any public API or user-facing feature change or any performance impact. None, compatible with before

Risk level (write none, low medium or high below)

If medium or high, explain what verification was done to mitigate the risks. none

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the instruction to make changes to the website.

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

boneanxs avatar Dec 14 '23 03:12 boneanxs

@yihua @danny0405 this is a new implement for https://github.com/apache/hudi/pull/8796. I'm still working on the relevant tests, but not sure whether this implementation is suitable, so put it up whereas you guys can share thoughts in advance. Thanks

boneanxs avatar Dec 14 '23 03:12 boneanxs

CI report:

  • 11b712bb0ee49dc663bdba8217fb6e84efbfed92 Azure: FAILURE
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Dec 14 '23 08:12 hudi-bot

Gentle ping @danny0405 @yihua @xushiyan is it suitable to allow partial commits reading? Any more thoughts abt this feature. We can see both delta and iceberg support maxFilePerTrigger and maxRowsPerTrigger, thinking it will be good adding this in HUDI as well.

boneanxs avatar Dec 20 '23 11:12 boneanxs