hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-1936] Introduce a optional property for conditional upsert

Open fanaticjo opened this issue 4 years ago • 5 comments

Tips

  • Thank you very much for contributing to Apache Hudi.
  • Please review https://hudi.apache.org/contributing.html before opening a pull request.

What is the purpose of the pull request

If anyone wants to use custom upsert logic then they have to override the Latest avro payload class which is only possible in java or scala .

Python developers have no such option .

Will be introducing a new payload class and a new key which can work in java , scala and python

This class will be responsible for custom upsert logic and a new key hoodie.update.key which will accept the columns which only need to be updated

"hoodie.update.keys": "admission_date,name", #comma seperated key "hoodie.datasource.write.payload.class": "com.hudiUpsert.hudiCustomUpsert" #custom upsert key

so this will only update the column admission_date and name in the target table

Brief change log

(for example:)

  • added hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithCustomAvroPayload.java

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Manually verified the change by running a job locally.

Committer checklist

  • [x] Has a corresponding JIRA in PR title & commit

  • [ ] Commit message is descriptive of the change

  • [ ] CI is green

  • [ ] Necessary doc changes done or have another open PR

fanaticjo avatar Jun 05 '21 06:06 fanaticjo

Codecov Report

Attention: Patch coverage is 8.57143% with 32 lines in your changes missing coverage. Please review.

Project coverage is 55.04%. Comparing base (974b476) to head (26dadb6). Report is 4289 commits behind head on master.

Files with missing lines Patch % Lines
...i/common/model/OverwriteWithCustomAvroPayload.java 10.34% 24 Missing and 2 partials :warning:
...apache/hudi/exception/ColumnNotFoundException.java 0.00% 2 Missing :warning:
...che/hudi/exception/UpdateKeyNotFoundException.java 0.00% 2 Missing :warning:
...apache/hudi/exception/WriteOperationException.java 0.00% 2 Missing :warning:
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #3035      +/-   ##
============================================
+ Coverage     55.01%   55.04%   +0.02%     
- Complexity     3850     3865      +15     
============================================
  Files           485      491       +6     
  Lines         23467    23640     +173     
  Branches       2497     2535      +38     
============================================
+ Hits          12911    13012     +101     
- Misses         9405     9466      +61     
- Partials       1151     1162      +11     
Flag Coverage Δ
hudicli 39.55% <ø> (ø)
hudiclient ∅ <ø> (∅)
hudicommon 50.14% <8.57%> (-0.17%) :arrow_down:
hudiflink 63.25% <ø> (-0.38%) :arrow_down:
hudihadoopmr 51.43% <ø> (-0.11%) :arrow_down:
hudisparkdatasource 74.28% <ø> (+0.95%) :arrow_up:
hudisync 46.60% <ø> (+0.15%) :arrow_up:
huditimelineservice 64.36% <ø> (ø)
hudiutilities 70.83% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...apache/hudi/exception/ColumnNotFoundException.java 0.00% <0.00%> (ø)
...che/hudi/exception/UpdateKeyNotFoundException.java 0.00% <0.00%> (ø)
...apache/hudi/exception/WriteOperationException.java 0.00% <0.00%> (ø)
...i/common/model/OverwriteWithCustomAvroPayload.java 10.34% <10.34%> (ø)

... and 24 files with indirect coverage changes

codecov-commenter avatar Jun 05 '21 11:06 codecov-commenter

cc @vingov do you mind taking a review at this, given its a python benefiting change

vinothchandar avatar Jun 19 '21 04:06 vinothchandar

In some sense, with the Spark SQL support now, python users can do custom merges? does that satisfy your requirements?

vinothchandar avatar Aug 10 '21 21:08 vinothchandar

CI report:

  • 26dadb6627c90c9f06e66fba0b8bd24e5579665f Azure: FAILURE
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Nov 05 '21 02:11 hudi-bot

@fanaticjo : We landed a partial payload support via https://github.com/apache/hudi/pull/4676. Let us know if we can close this patch or if its possible to enhance the 4676 or if this patch is addressing something different.

nsivabalan avatar Oct 20 '22 04:10 nsivabalan

Since there is no update on this PR for a while and Hudi already supports partial updates with a more general approach than the payload this PR proposes, closing this PR now. Feel free to reopen if needed.

yihua avatar Sep 10 '24 15:09 yihua