inlong icon indicating copy to clipboard operation
inlong copied to clipboard

[Umbrella] Support Apache Hudi

Open dockerzhang opened this issue 3 years ago • 4 comments

Describe the proposal

Sort module supports Apache Hudi.

Task list

  • [ ] Add Apache Hudi Extract Load for Sort
  • [ ] Add Apache Hudi Load Load for Sort
  • [ ] Manage the Apache Hudi Extract/Load Node

InLong Component

Other for not specified component

Are you willing to submit PR?

  • [ ] Yes, I am willing to submit a PR!

Code of Conduct

dockerzhang avatar Jul 18 '22 08:07 dockerzhang

@dockerzhang Please assign to me, thank you!

Jellal-HT avatar Jul 21 '22 13:07 Jellal-HT

Motivation

Sort module supports Apache Hudi. Apache Hudi is a popular streaming datalake platform. We should support Apache Hudi in sort module.

Design

The design will follow following the document Sort Plugin and Manager Plugin

  1. Extend a new Extract Node for Apache Hudi
  2. Extend a new Load Node for Apache Hudi
  3. Implement the corresponding flink connectors for Apache Hudi
  4. Extend Extract Node and Load Node in manager module for apache Hudi

Modification

Load Node

  1. add the new class HudiLoadNode, which inherits the LoadNode class
  2. add the Load for Hudi to JsonSubTypes in LoadNode and Node

Extract Node

  1. add the new class HudiExtractNode, which inherits the ExtractNode class
  2. add the Extract for Hudi to JsonSubTypes in ExtractNode and Node

Flink Connector

Creating new file called Hudi in inlong-sort/sort-connectors, Adding new classes into the file:

  • HudiTableSink

  • HudiTableSource

  • HudiTableFactory

  • ConfigOptions

  • HudiCatalog

  • HudiCatalogFactory

(As Apache Hudi has already integrated Flink, this part will refer to the implementation of flink connector in Apache Hudi)

Manager plugin

Follow the document Manager Plugin to extend extract node and load node

Jellal-HT avatar Jul 28 '22 12:07 Jellal-HT

This is a good idea and the plan is all right, looking forward to your pr.

yunqingmoswu avatar Jul 29 '22 02:07 yunqingmoswu

This issue is stale because it has been open for 60 days with no activity.

github-actions[bot] avatar Sep 28 '22 02:09 github-actions[bot]

duplicated with #https://github.com/apache/inlong/issues/6781, close it.

dockerzhang avatar Dec 09 '22 03:12 dockerzhang