hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-7610] Resolve issues for delete records

Open linliu-code opened this issue 1 year ago • 1 comments

Change Logs

The main problem we face for delete logic is that some DeleteRecord do not have valid orderingVal field. This is not a problem for processing time based merging, but it breaks the event time based merging. The fundamental solution is to preserve the orderingValue field for DeleteRecord, which may not be possible or easy in reality; we don't attempt to do that in this PR. Here we focus on making the delete logic reasonable and consistent across fg and non-fg readers, spark/avro record types. This problem is mainly for MOR table.

For a given record key RK, suppose we have a series of operations on it, like insert, update, delete, update, delete, update, etc. That is, we have a series of records, i.e., br1, lfr1, lfr2, lfr3, lfr4, etc. (1) If all records have the orderingVal field, we can successfully merge based on event time, which is the happy path. (2) If lfr3 is a delete record without ordering value, we don't have enough information to merge it with other records based on event time. Here a reasonable assumption is: all records before this delete record, i.e., its commit time is bigger, can be considered as processing time based. But records that are newer than the delete record, can keep merging based on event time. In this way, we combined processing time and event time in a logical way, which is universal for all Spark/Avro, COW/MOR, with FG or without FG.

To implement, we create a metadata entry "PROCESSING_TIME_BASED_DELETE_FOUND" to indicate that a processing time based delete has been found; any further merging should be skipped. (1) For non-fg reader, we store the flag into the HoodieRecord.metadata field. For further merging, this flag is kept, which is used to skip merging with base file record. (2) For fg reader, we store the the flag into the metadata field of the record buffer. All further merging should be skipped.

Impact

Make the delete logic consistent across different record types, and fg and non-fg readers.

Risk level (write none, low medium or high below)

Medium.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the instruction to make changes to the website.

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

linliu-code avatar Oct 17 '24 19:10 linliu-code

CI report:

  • dde4aae60db7ba0693eeb983fdf0032d2165863b UNKNOWN
  • 24e7cebd6392f97fcd02e4e3bf561949ced383f5 UNKNOWN
  • 795905ffa1bc5ed7c04a2e27d75da95ff86c36f8 UNKNOWN
  • 742d7f38c4d60f8fc562dd03c394f2d4ef5fb956 Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Oct 29 '24 02:10 hudi-bot

Given the feedbacks, I think we should go over the design again. Since it is not a block for 1.0.0 release, I will deprioritize it for now.

linliu-code avatar Nov 13 '24 23:11 linliu-code

This is already addressed by #12390 for bug fixes around deletes and #12452 for improving ordering value handling. Closing this PR.

yihua avatar Dec 11 '24 07:12 yihua