hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-7240] Clean delete logic

Open linliu-code opened this issue 1 year ago • 3 comments

Change Logs

  1. When we create HoodieRecord for a delete, we store the necessary information into the metadata field.
  2. When we need to merge delete records, we extract orderingVal from metadata field of HoodieRecord.
  3. Removed HoodieRecordTestPayload.

Impact

Simplifies the logic for handling delete records.

Risk level (write none, low medium or high below)

Low.

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

linliu-code avatar Dec 22 '23 01:12 linliu-code

@yihua @codope @danny0405

linliu-code avatar Dec 22 '23 01:12 linliu-code

Will clean the failures.

linliu-code avatar Dec 22 '23 02:12 linliu-code

CI report:

  • 3d71d1c0e3220f0639b702d91539e1d070e93cca Azure: FAILURE
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Jan 09 '24 03:01 hudi-bot

Thanks for raising this fix, I think it is a good chance we fix the event time sequence comparison of delete records with payloads, I can see 2 mistaks in our code that uses processing time sequence for deletes:

  1. OverwriteWithLatestAvroPayload#preCombine:
  public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload oldValue) {
    if (oldValue.recordBytes.length == 0) {
      // use natural order for delete record
      return this;
    }
    ...
  }
  1. DefaultHoodieRecordPayload#combineAndGetUpdateValue
  public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema, Properties properties) throws IOException {
    if (recordBytes.length == 0) {
      return Option.empty();
    }

    ...
  }

In any case, the orderingVal should be set up correctly and we should utilize it as much as possible.

danny0405 avatar Jan 22 '24 05:01 danny0405