hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-7416] Add interface for StreamProfile to be used in StreamSync for reading and writing data

Open vinishjail97 opened this issue 5 months ago • 1 comments

Change Logs

There were test failures in the original PR and had to be reverted, bringing back the change and fixed the tests now. https://github.com/apache/hudi/pull/10687

Introducing a new class known as SourceProfile which contains details about how the next sync round in StreamSync should be consumed. For eg:

KafkaSourceProfile contains number of events to consume in this sync round. S3SourceProfile contains the list of files to consume in this sync round HudiIncrementalSourceProfile contains the beginInstant and endInstant commit times to consume in this sync round. In future we can add the method for choosing the writeOperationType and indexType as well, for sourceProfile.getSourceSpecificContext() will be used to consume the data from the source.

Impact

No change in public API's, Option has been used to define the new field in the constructors and previous constructors are backwards compatible.

Risk level (write none, low medium or high below)

Low

Documentation Update

None, this is just adding an optional interface that can be used to consume and write data in StreamSync utility.

Contributor's checklist

  • [x] Read through contributor's guide
  • [x] Change Logs and Impact were stated clearly
  • [x] Adequate tests were added if applicable
  • [ ] CI passed

vinishjail97 avatar Feb 23 '24 04:02 vinishjail97

CI report:

  • 707fc464da051e02301c730b5b5402bbe3bf3a05 Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Feb 24 '24 10:02 hudi-bot