hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-7501] Use source profile for S3 and GCS sources

Open vinishjail97 opened this issue 11 months ago • 3 comments

Change Logs

Use SourceProfile in S3/GCS sources for the sourceLimit and numPartitions. Instead of relying on the sourceLimit passed in from static config use the one provided by the source profile.

Impact

No impact to existing public constructors, the change is backwards compatible.

Risk level (write none, low medium or high below)

Medium

Documentation Update

None, this is just adding a new public constructor to include StreamContext as a parameter in S3/GCS sources.

Contributor's checklist

  • [x] Read through contributor's guide
  • [x] Change Logs and Impact were stated clearly
  • [x] Adequate tests were added if applicable
  • [ ] CI passed

vinishjail97 avatar Mar 13 '24 19:03 vinishjail97

Is it possible to get a test coverage for the source classes touched and share the report.

Generally, GcsEventsHoodieIncrSource and S3EventsHoodieIncrSource does not have any function tests right. so, we rely heavily on mocked UTs.

GcsEventsHoodieIncrSource, S3EventsHoodieIncrSource, CloudDataFetcher and CloudObjectsSelectorCommon have decent coverage

image image

vinishjail97 avatar Mar 18 '24 14:03 vinishjail97

@vinishjail97 could you rebase this PR on the latest master?

yihua avatar May 03 '24 02:05 yihua

CI report:

  • 2c6eb9de69f80fbc5cbd83c8e2faa4ed93bf0980 Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar May 13 '24 01:05 hudi-bot