hudi
hudi copied to clipboard
[HUDI-7236] Allow MIT to change partition path when using global index
Change Logs
ExpressionPayload is necessary for MIT, but when partition path is changed for a record it interferes. This pr fixes this feature by reading existing records with the default payload, and waits to add the meta fields until after merging is complete
Also includes https://github.com/apache/hudi/pull/10318
Explanation of why this change is necessary:
HoodieRecord.HoodieRecordType recordType = table.getConfig().getRecordMerger().getRecordType()
HoodieFileReader baseFileReader = HoodieFileReaderFactory
.getReaderFactory(recordType)
.getFileReader(hadoopConf, mergeHandle.getOldFilePath());
The payload is determined by the record merger, not the payload config. This is why we need to override ExpressionPayload when calling getExistingRecords.
In HoodieAppendHandle we check for SENTINEL and ignore the record since this means that the merge does not result in the record being modified. This is why we add similar logic in HoodieIndexUtils.
Impact
The feature now works with MIT
Risk level (write none, low medium or high below)
low
Documentation Update
N/A
Contributor's checklist
- [ ] Read through contributor's guide
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
@hudi-bot run azure
@nsivabalan The change I made is to use the config from that keyGeneratorWriteConfigOpt when calling mergeIncomingWithExistingRecordWithExpressionPayload. We need to do that because otherwise wrapIntoHoodieRecordPayloadWithParams will wrap it into an expression payload.
CI report:
- 09fcde8279de810695019a249ce4913c77668135 UNKNOWN
- 82eaedd172c3336b9c18fbe2a25b7b8464334507 Azure: FAILURE
Bot commands
@hudi-bot supports the following commands:-
@hudi-bot run azure
re-run the last Azure build