hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-7236] Allow MIT to change partition path when using global index

Open jonvex opened this issue 1 year ago • 3 comments

Change Logs

ExpressionPayload is necessary for MIT, but when partition path is changed for a record it interferes. This pr fixes this feature by reading existing records with the default payload, and waits to add the meta fields until after merging is complete

Also includes https://github.com/apache/hudi/pull/10318

Explanation of why this change is necessary:

HoodieMergeHelper reads the base file using
HoodieRecord.HoodieRecordType recordType = table.getConfig().getRecordMerger().getRecordType()
    HoodieFileReader baseFileReader = HoodieFileReaderFactory
        .getReaderFactory(recordType)
        .getFileReader(hadoopConf, mergeHandle.getOldFilePath());

The payload is determined by the record merger, not the payload config. This is why we need to override ExpressionPayload when calling getExistingRecords.

In HoodieAppendHandle we check for SENTINEL and ignore the record since this means that the merge does not result in the record being modified. This is why we add similar logic in HoodieIndexUtils.

Impact

The feature now works with MIT

Risk level (write none, low medium or high below)

low

Documentation Update

N/A

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

jonvex avatar Dec 15 '23 21:12 jonvex

@hudi-bot run azure

jonvex avatar Dec 16 '23 03:12 jonvex

@nsivabalan The change I made is to use the config from that keyGeneratorWriteConfigOpt when calling mergeIncomingWithExistingRecordWithExpressionPayload. We need to do that because otherwise wrapIntoHoodieRecordPayloadWithParams will wrap it into an expression payload.

jonvex avatar Dec 19 '23 17:12 jonvex

CI report:

  • 09fcde8279de810695019a249ce4913c77668135 UNKNOWN
  • 82eaedd172c3336b9c18fbe2a25b7b8464334507 Azure: FAILURE
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Jan 27 '24 02:01 hudi-bot