hudi
hudi copied to clipboard
[HUDI-7240] Clean delete logic
Change Logs
- When we create HoodieRecord for a delete, we store the necessary information into the metadata field.
- When we need to merge delete records, we extract orderingVal from metadata field of HoodieRecord.
- Removed HoodieRecordTestPayload.
Impact
Simplifies the logic for handling delete records.
Risk level (write none, low medium or high below)
Low.
Contributor's checklist
- [ ] Read through contributor's guide
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
@yihua @codope @danny0405
Will clean the failures.
CI report:
- 3d71d1c0e3220f0639b702d91539e1d070e93cca Azure: FAILURE
Bot commands
@hudi-bot supports the following commands:-
@hudi-bot run azure
re-run the last Azure build
Thanks for raising this fix, I think it is a good chance we fix the event time sequence comparison of delete records with payloads, I can see 2 mistaks in our code that uses processing time sequence for deletes:
-
OverwriteWithLatestAvroPayload#preCombine
:
public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload oldValue) {
if (oldValue.recordBytes.length == 0) {
// use natural order for delete record
return this;
}
...
}
-
DefaultHoodieRecordPayload#combineAndGetUpdateValue
public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema, Properties properties) throws IOException {
if (recordBytes.length == 0) {
return Option.empty();
}
...
}
In any case, the orderingVal
should be set up correctly and we should utilize it as much as possible.