[Umbrella] Support Apache Hudi
Describe the proposal
Sort module supports Apache Hudi.
Task list
- [ ] Add Apache Hudi Extract Load for Sort
- [ ] Add Apache Hudi Load Load for Sort
- [ ] Manage the Apache Hudi Extract/Load Node
InLong Component
Other for not specified component
Are you willing to submit PR?
- [ ] Yes, I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
@dockerzhang Please assign to me, thank you!
Motivation
Sort module supports Apache Hudi. Apache Hudi is a popular streaming datalake platform. We should support Apache Hudi in sort module.
Design
The design will follow following the document Sort Plugin and Manager Plugin
- Extend a new Extract Node for Apache Hudi
- Extend a new Load Node for Apache Hudi
- Implement the corresponding flink connectors for Apache Hudi
- Extend Extract Node and Load Node in manager module for apache Hudi
Modification
Load Node
- add the new class
HudiLoadNode, which inherits the LoadNode class - add the Load for Hudi to JsonSubTypes in LoadNode and Node
Extract Node
- add the new class
HudiExtractNode, which inherits the ExtractNode class - add the Extract for Hudi to JsonSubTypes in ExtractNode and Node
Flink Connector
Creating new file called Hudi in inlong-sort/sort-connectors, Adding new classes into the file:
-
HudiTableSink
-
HudiTableSource
-
HudiTableFactory
-
ConfigOptions
-
HudiCatalog
-
HudiCatalogFactory
(As Apache Hudi has already integrated Flink, this part will refer to the implementation of flink connector in Apache Hudi)
Manager plugin
Follow the document Manager Plugin to extend extract node and load node
This is a good idea and the plan is all right, looking forward to your pr.
This issue is stale because it has been open for 60 days with no activity.
duplicated with #https://github.com/apache/inlong/issues/6781, close it.