hudi
hudi copied to clipboard
[HUDI-7416] Add interface for StreamProfile to be used in StreamSync for reading and writing data
Change Logs
There were test failures in the original PR and had to be reverted, bringing back the change and fixed the tests now. https://github.com/apache/hudi/pull/10687
Introducing a new class known as SourceProfile which contains details about how the next sync round in StreamSync should be consumed. For eg:
KafkaSourceProfile
contains number of events to consume in this sync round.
S3SourceProfile
contains the list of files to consume in this sync round
HudiIncrementalSourceProfile
contains the beginInstant and endInstant commit times to consume in this sync round.
In future we can add the method for choosing the writeOperationType and indexType as well, for sourceProfile.getSourceSpecificContext()
will be used to consume the data from the source.
Impact
No change in public API's, Option has been used to define the new field in the constructors and previous constructors are backwards compatible.
Risk level (write none, low medium or high below)
Low
Documentation Update
None, this is just adding an optional interface that can be used to consume and write data in StreamSync utility.
Contributor's checklist
- [x] Read through contributor's guide
- [x] Change Logs and Impact were stated clearly
- [x] Adequate tests were added if applicable
- [ ] CI passed
CI report:
- 707fc464da051e02301c730b5b5402bbe3bf3a05 Azure: SUCCESS
Bot commands
@hudi-bot supports the following commands:-
@hudi-bot run azure
re-run the last Azure build