hudi
hudi copied to clipboard
[HUDI-4630] Add transformer capability to individual feeds in MultiTableDeltaStreamer
Change Logs
Context: https://apache-hudi.slack.com/archives/C4D716NPQ/p1660215517081789
MultiTableDeltastreamer currently supports single transformer class for all of the data being synced. And it can only be enabled or disabled as a whole. There is no support for enabling transformers for a select feed of data or to use different transformers for different feeds. This PR addresses the same.
The same feature is available in schemaprovider class through hoodie.deltastreamer.schemaprovider.class property in table level configs.
Impact
The impact audience are the confined to users of MultiTableDeltaStreamer and that too if they use transformers. And since this is a new feature, things should run as-is even if this change has been incorporated.
Risk level: none | low | medium | high
Low
Contributor's checklist
- [ x] Read through contributor's guide
- [ x] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
@yesemsanthoshkumar : once you have added tests, let us know. and please rebase w/ latest master.
CI report:
- bf2c5548f6d483ff9c2a190076d9d64dea61610a UNKNOWN
- 0f2ad047fa81f3e23c1e8a190379fabc81b6a3cb Azure: FAILURE
Bot commands
@hudi-bot supports the following commands:@hudi-bot run azurere-run the last Azure build
@yesemsanthoshkumar : Can you rebase the PR for us to review.
CI report:
- bf2c5548f6d483ff9c2a190076d9d64dea61610a UNKNOWN
- c6a245dfaaf6f1bb6cce2d843360ffaa8d042619 Azure: FAILURE
Bot commands
@hudi-bot supports the following commands:@hudi-bot run azurere-run the last Azure build
@bvaradar Rebased. Let me know of any changes.
Just a quick heads up - I ran the latest master MultiTableDeltaStreamer without hoodie.deltastreamer.transformer.class config and I got a NullPointerException due to the .split() in line:
List<String> transformerClassNameOverride = Arrays.asList(typedProperties.getString(Constants.TRANSFORMER_CLASS, null).split(","));
because my TRANSFORMER_CLASS is null. I changed the function to the following with success:
private void populateTransformerProps(HoodieDeltaStreamer.Config cfg, TypedProperties typedProperties) {
String transformerClass = typedProperties.getString(Constants.TRANSFORMER_CLASS, null);
if (transformerClass != null && !transformerClass.trim().isEmpty()) {
List<String> transformerClassNameOverride = Arrays.asList(transformerClass.split(","));
cfg.transformerClassNames = transformerClassNameOverride;
}
}
Thanks @sydneyhoran for catching this issue. @yesemsanthoshkumar : I will go ahead and revert this commit in master. Can you please make the change as suggested and add tests for null case.
Thanks @sydneyhoran for catching this issue. @yesemsanthoshkumar : I will go ahead and revert this commit in master. Can you please make the change as suggested and add tests for null case.
Will do it over the weekend. Thanks @sydneyhoran for catching this issue.