hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-8076] Timeline - Support 0.x and 1.x implementation for timeline and related classes

Open bvaradar opened this issue 1 year ago • 1 comments

Change Logs

Major changes: As part of HUDI-8076 proposal,

  1. Create Factory Interface for instantiating HoodieInstant and Hoodie Timelines.
  2. Use TimelineLayout to determine what version to use for Timeline and instant specific logic.
  3. HoodieInstant has following new interfaces: (a) InstantFactory : Construct HoodieInstant (b)InstantFileNameFactory : Construct filenames from HoodieInstant (c) InstantFileNameParser : Encapsulates logic to extract timestamp,.. (d) InstantComparator : Encapsulates logic to compare instants (e:g completion time based) - Each of these interfaces has implementation for 0.15 and 1.x release versions.
  4. Make Timeline classes such as HoodieActiveTImeline, HoodieArchiveTImeline and DefaultTImeline as interfaces.
  5. Port 0.x timeline and HoodieInstant logic as one implementations of the above factory and timeline interfaces.
  6. Refactor existing timeline classes as 1.x implementation for the new timeline interfaces.

Still In progress:

  1. Getting Tests passing
  2. Handle 0.x commit and other metadata
  3. Create new interface for ArchiveTimelineWriter and migrate 0.x code.

Impact

Refactoring to allow supporting writing and reading in different version compatibility.

Risk level (write none, low medium or high below)

low

Documentation Update

N/A

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

bvaradar avatar Sep 10 '24 23:09 bvaradar

From highlevel, I kind of think the invocation chains are way to long for instant creation and SE/DE, can we always use the meta client as the entrance and have some top APIs for these discrepancies:

  1. metaClient.getInstantGenerator.generateNewInstant() and metaClient.getInstantGenerator.getInstantFileName(xxx);
  2. metaClient.getInstantSeDe.deserialize(xxx),
  3. metaClient.getTimelineArchiver(...).archive

So that the caller does not need to always care about all different kinds of factories, the details should be wrapped just into the meta client impls.

danny0405 avatar Nov 06 '24 06:11 danny0405

I kind of feel we need to think through the top-level APIs around HoodieInstant and HoodieTimeline here. Because this is very related to the experience of Hudi developers.

It looks like we use the meta client as the entrance to fetch a timeline, and the timeline is in duty for instant creation and SE/DE. I’m okay with this if we want to keep this convention.

Another choice is we use specific class/interface as the top-level entrance, for example, we can have HoodieInstants as the unified entrance for instant related functionalities:

HoodieInstants.builder(metaClient).getInstantSeDe().serialize(...)
HoodieInstants.builder(metaClient).getInstantCreator().create(...)
HoodieInstants.getNameParser(...).getRequestedFileName(instant) ...

And for timeline, we can also have it’s own builder, just use HoodieTimeline as the entrance:

HoodieTimeline.builder(metaClient).active().build()
HoodieTimeline.builder(metaClient).archived().build()

And still, we should hide the details of different version impls for all of these. The specific factory impls should never be exposed as top-level API

danny0405 avatar Nov 07 '24 01:11 danny0405

@vinothchandar @danny0405 : This PR is ready for your review. Can you please take a final pass.

bvaradar avatar Nov 15 '24 01:11 bvaradar

Lets please stick to requestedTime, since we already have that as the state. We will rename everything consistently based on that going forward. Not the PR to be having that conversation IMHO

vinothchandar avatar Nov 15 '24 05:11 vinothchandar

@danny0405 : Thanks for your review. Finished the comments regarding renaming. PTAL

bvaradar avatar Nov 15 '24 07:11 bvaradar

CI report:

  • 280ec55ed1454dfef9564c62657055c942ffa344 Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Nov 15 '24 17:11 hudi-bot

Tests failing are flaky. They are locally passing. Landing this PR. Also verified by locally rebasing against current master and compiling successfully.

bvaradar avatar Nov 15 '24 17:11 bvaradar