hudi
hudi copied to clipboard
[HUDI-1936] Introduce a optional property for conditional upsert
Tips
- Thank you very much for contributing to Apache Hudi.
- Please review https://hudi.apache.org/contributing.html before opening a pull request.
What is the purpose of the pull request
If anyone wants to use custom upsert logic then they have to override the Latest avro payload class which is only possible in java or scala .
Python developers have no such option .
Will be introducing a new payload class and a new key which can work in java , scala and python
This class will be responsible for custom upsert logic and a new key hoodie.update.key which will accept the columns which only need to be updated
"hoodie.update.keys": "admission_date,name", #comma seperated key "hoodie.datasource.write.payload.class": "com.hudiUpsert.hudiCustomUpsert" #custom upsert key
so this will only update the column admission_date and name in the target table
Brief change log
(for example:)
- added hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithCustomAvroPayload.java
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
- Manually verified the change by running a job locally.
Committer checklist
-
[x] Has a corresponding JIRA in PR title & commit
-
[ ] Commit message is descriptive of the change
-
[ ] CI is green
-
[ ] Necessary doc changes done or have another open PR
Codecov Report
Attention: Patch coverage is 8.57143% with 32 lines in your changes missing coverage. Please review.
Project coverage is 55.04%. Comparing base (
974b476) to head (26dadb6). Report is 4289 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #3035 +/- ##
============================================
+ Coverage 55.01% 55.04% +0.02%
- Complexity 3850 3865 +15
============================================
Files 485 491 +6
Lines 23467 23640 +173
Branches 2497 2535 +38
============================================
+ Hits 12911 13012 +101
- Misses 9405 9466 +61
- Partials 1151 1162 +11
| Flag | Coverage Δ | |
|---|---|---|
| hudicli | 39.55% <ø> (ø) |
|
| hudiclient | ∅ <ø> (∅) |
|
| hudicommon | 50.14% <8.57%> (-0.17%) |
:arrow_down: |
| hudiflink | 63.25% <ø> (-0.38%) |
:arrow_down: |
| hudihadoopmr | 51.43% <ø> (-0.11%) |
:arrow_down: |
| hudisparkdatasource | 74.28% <ø> (+0.95%) |
:arrow_up: |
| hudisync | 46.60% <ø> (+0.15%) |
:arrow_up: |
| huditimelineservice | 64.36% <ø> (ø) |
|
| hudiutilities | 70.83% <ø> (ø) |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Files with missing lines | Coverage Δ | |
|---|---|---|
| ...apache/hudi/exception/ColumnNotFoundException.java | 0.00% <0.00%> (ø) |
|
| ...che/hudi/exception/UpdateKeyNotFoundException.java | 0.00% <0.00%> (ø) |
|
| ...apache/hudi/exception/WriteOperationException.java | 0.00% <0.00%> (ø) |
|
| ...i/common/model/OverwriteWithCustomAvroPayload.java | 10.34% <10.34%> (ø) |
cc @vingov do you mind taking a review at this, given its a python benefiting change
In some sense, with the Spark SQL support now, python users can do custom merges? does that satisfy your requirements?
CI report:
- 26dadb6627c90c9f06e66fba0b8bd24e5579665f Azure: FAILURE
Bot commands
@hudi-bot supports the following commands:@hudi-bot run azurere-run the last Azure build
@fanaticjo : We landed a partial payload support via https://github.com/apache/hudi/pull/4676. Let us know if we can close this patch or if its possible to enhance the 4676 or if this patch is addressing something different.
Since there is no update on this PR for a while and Hudi already supports partial updates with a more general approach than the payload this PR proposes, closing this PR now. Feel free to reopen if needed.