hudi
hudi copied to clipboard
[HUDI-7501] Use source profile for S3 and GCS sources
Change Logs
Use SourceProfile in S3/GCS sources for the sourceLimit and numPartitions. Instead of relying on the sourceLimit passed in from static config use the one provided by the source profile.
Impact
No impact to existing public constructors, the change is backwards compatible.
Risk level (write none, low medium or high below)
Medium
Documentation Update
None, this is just adding a new public constructor to include StreamContext as a parameter in S3/GCS sources.
Contributor's checklist
- [x] Read through contributor's guide
- [x] Change Logs and Impact were stated clearly
- [x] Adequate tests were added if applicable
- [ ] CI passed
Is it possible to get a test coverage for the source classes touched and share the report.
Generally, GcsEventsHoodieIncrSource and S3EventsHoodieIncrSource does not have any function tests right. so, we rely heavily on mocked UTs.
GcsEventsHoodieIncrSource, S3EventsHoodieIncrSource, CloudDataFetcher and CloudObjectsSelectorCommon have decent coverage
@vinishjail97 could you rebase this PR on the latest master?
CI report:
- 2c6eb9de69f80fbc5cbd83c8e2faa4ed93bf0980 Azure: SUCCESS
Bot commands
@hudi-bot supports the following commands:-
@hudi-bot run azure
re-run the last Azure build