hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-4630] Add transformer capability to individual feeds in MultiTableDeltaStreamer

Open yesemsanthoshkumar opened this issue 3 years ago • 1 comments
trafficstars

Change Logs

Context: https://apache-hudi.slack.com/archives/C4D716NPQ/p1660215517081789

MultiTableDeltastreamer currently supports single transformer class for all of the data being synced. And it can only be enabled or disabled as a whole. There is no support for enabling transformers for a select feed of data or to use different transformers for different feeds. This PR addresses the same.

The same feature is available in schemaprovider class through hoodie.deltastreamer.schemaprovider.class property in table level configs.

Impact

The impact audience are the confined to users of MultiTableDeltaStreamer and that too if they use transformers. And since this is a new feature, things should run as-is even if this change has been incorporated.

Risk level: none | low | medium | high

Low

Contributor's checklist

  • [ x] Read through contributor's guide
  • [ x] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

yesemsanthoshkumar avatar Sep 20 '22 14:09 yesemsanthoshkumar

@yesemsanthoshkumar : once you have added tests, let us know. and please rebase w/ latest master.

nsivabalan avatar Oct 19 '22 07:10 nsivabalan

CI report:

  • bf2c5548f6d483ff9c2a190076d9d64dea61610a UNKNOWN
  • 0f2ad047fa81f3e23c1e8a190379fabc81b6a3cb Azure: FAILURE
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Nov 07 '22 21:11 hudi-bot

@yesemsanthoshkumar : Can you rebase the PR for us to review.

bvaradar avatar Feb 22 '23 07:02 bvaradar

CI report:

  • bf2c5548f6d483ff9c2a190076d9d64dea61610a UNKNOWN
  • c6a245dfaaf6f1bb6cce2d843360ffaa8d042619 Azure: FAILURE
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Feb 26 '23 20:02 hudi-bot

@bvaradar Rebased. Let me know of any changes.

yesemsanthoshkumar avatar Feb 27 '23 04:02 yesemsanthoshkumar

Just a quick heads up - I ran the latest master MultiTableDeltaStreamer without hoodie.deltastreamer.transformer.class config and I got a NullPointerException due to the .split() in line:

List<String> transformerClassNameOverride = Arrays.asList(typedProperties.getString(Constants.TRANSFORMER_CLASS, null).split(","));

because my TRANSFORMER_CLASS is null. I changed the function to the following with success:

  private void populateTransformerProps(HoodieDeltaStreamer.Config cfg, TypedProperties typedProperties) {
    String transformerClass = typedProperties.getString(Constants.TRANSFORMER_CLASS, null);
    if (transformerClass != null && !transformerClass.trim().isEmpty()) {
      List<String> transformerClassNameOverride = Arrays.asList(transformerClass.split(","));
      cfg.transformerClassNames = transformerClassNameOverride;
    }
  }

sydneyhoran avatar Mar 14 '23 15:03 sydneyhoran

Thanks @sydneyhoran for catching this issue. @yesemsanthoshkumar : I will go ahead and revert this commit in master. Can you please make the change as suggested and add tests for null case.

bvaradar avatar Mar 14 '23 15:03 bvaradar

Thanks @sydneyhoran for catching this issue. @yesemsanthoshkumar : I will go ahead and revert this commit in master. Can you please make the change as suggested and add tests for null case.

Will do it over the weekend. Thanks @sydneyhoran for catching this issue.

yesemsanthoshkumar avatar Mar 15 '23 09:03 yesemsanthoshkumar