hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-9376] Archived timeline backwards compatibility

Open alexr17 opened this issue 6 months ago • 1 comments

Change Logs

There is some backwards compatibility issues with reading archived timeline instants pre 0.8.

1st issue is that Avro was upgraded in 0.12 to a version which does not allow for default Union to be null. This breaks reading these old instants in HoodieDataBlock. 2nd issue is that it is possible for archived instants to have same timestamp for different actions, especially given that older versions used seconds for instant rather than milliseconds. However the archived timeline does not assume this and assumes each instant has unique timestamp.

Impact

Now the archived timeline stores a map of string (instant) -> string (actiontype) -> map <instant, byte[]>

We also retry failures to getSchemaFromHeader with .setValidateDefaults(false) in case of AvroTypeException

Risk level (write none, low medium or high below)

Low

Documentation Update

None

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

alexr17 avatar May 05 '25 20:05 alexr17

CI report:

  • b54b0ae3b13b75b2d9d281d79785df0d534baabc Azure: FAILURE
  • 8e3efcd4b1ce3b30c7c7510ad02df8e9f5c816b0 UNKNOWN
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Aug 25 '25 23:08 hudi-bot