hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-9332] Pluggable Table Format Support with native Integration

Open bvaradar opened this issue 7 months ago • 3 comments

Change Logs

[HUDI-9332] Pluggable Table Format Support with native Integration

  1. Includes base interface for pluggable table format
  2. Includes native hudi integration for pluggable table format
  3. Includes the MDT side integration for puggability.

Impact

Initia Implementation of RFC - 93

Risk level (write none, low medium or high below)

none

Documentation Update

none

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

bvaradar avatar Apr 24 '25 00:04 bvaradar

I don't understand costs in terms of performance, that we should pay for this change. There are a lot of work that was already done, and there is still a lot to do down the road. From my point of view, performance is a critical parameter for users when we are talking about Big Data processing.

Is there any benchmark results for this change?

geserdugarov avatar May 30 '25 08:05 geserdugarov

Another question, what will happen, if we wrote Hudi metadata successfully, but failed during writing of Iceberg or Delta metadata? How we will process this case, when Hudi table is fine, but interoperability support is broken at some commit?

geserdugarov avatar May 30 '25 09:05 geserdugarov

@hudi-bot run azure

vinothchandar avatar Jun 30 '25 18:06 vinothchandar

@hudi-bot run azure

vinothchandar avatar Jun 30 '25 21:06 vinothchandar

@geserdugarov sorry. missed your comments there.. does the RFC help address some of those

vinothchandar avatar Jul 01 '25 02:07 vinothchandar

CI report:

  • 222d7b08a976998e5416ed6bd7c90abb1d2aee0e Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Jul 02 '25 04:07 hudi-bot

@geserdugarov sorry. missed your comments there.. does the RFC help address some of those

@vinothchandar, thanks for your reply. Unfortunately, the RFC doesn't address mentioned questions. My main concern is performance, I really hope we won’t lose it after these changes. Since there are no performance tests in the project, we’ll have to verify it manually. Another issue is the increasing code complexity. It keeps growing, and it’s becoming harder and harder to provide a stable solution. I hope some progress will be done under HUDI-9054.

geserdugarov avatar Jul 02 '25 06:07 geserdugarov

@geserdugarov what is the performance concern that it may induces in this PR? We should definitely fix it IMO, can you guide me the lines in this patch?

In general, all the changes in this path should not bring in performance issues, some hook/empty invocations shoud be fine and the cost could be negligible.

I agree we should limit the code complexity, it's the way we are evolving the project, we welcome new features always if it is valuable, then try to make the code in good shape as much as we can, but yeah, sometimes it is hard to give a elegant solution based on the amout of work it could take.

Appreciate your points.

danny0405 avatar Jul 02 '25 07:07 danny0405

@geserdugarov what is the performance concern that it may induces in this PR? We should definitely fix it IMO, can you guide me the lines in this patch?

In general, all the changes in this path should not bring in performance issues, some hook/empty invocations shoud be fine and the cost could be negligible.

I agree we should limit the code complexity, it's the way we are evolving the project, we welcome new features always if it is valuable, then try to make the code in good shape as much as we can, but yeah, sometimes it is hard to give a elegant solution based on the amout of work it could take.

Appreciate your points.

@danny0405 , from quick checking of this patch, you're right, looks like there should be no performance issues. I only see that we started to pass HoodieTableMetaClient way more, and it's a Serializable class, which contains a lot of not transient objects. If there are any frequently called place in the code, where we will start to use Kryo SerDe of this class, (which I don't actually see) it could potentially decrease performance. But again, it's just a guessing, and it would be great to have some continuous performance monitoring to be sure in all big changes. I also will try to think about it in my spare time.

geserdugarov avatar Jul 02 '25 08:07 geserdugarov

looks like there should be no performance issues. I only see that we started to pass HoodieTableMetaClient way more

@geserdugarov I also noticed this change, in most of the changes, the context already contains a meta client, and we just need it's light-weight member like "table format", the other logic are almost the same before the patch.

danny0405 avatar Jul 02 '25 11:07 danny0405

it's just a guessing, and it would be great to have some continuous performance monitoring to be sure in all big changes.

+1. this is a great idea in general.. for this PR, we had a bunch of conversations around this. and @bvaradar explicitly assured me and ensured that we ll not incur overheads on native path.

@danny0405 I plugged the one issue I saw around extra timeline calls. CI is green. I'll land this in few hours, until I hear otherwise.

vinothchandar avatar Jul 02 '25 14:07 vinothchandar