incubator-streampark icon indicating copy to clipboard operation
incubator-streampark copied to clipboard

[FEATURE] It is very important to support the centralized management of metadata

Open Narcasserun opened this issue 4 years ago • 5 comments

As a stream computing platform, it is particularly important for streamx to support metadata management,I will give the design of metadata. Metadata can improve the development efficiency and let users only care about business logic. If it can support data governance at the same time, it will make streamx more powerful

1.Architecture diagram

元数据_架构

2.Key design points

  • SPI based metadata acquisition
  • Metadata can be acquired in real time without external storage
  • Plug in metadata, which can run independently

3.Integration with streamX / Other platforms

  • One key generation of streamx source table and sink table,Users only care about business logic
  • Metadata provides restful services for other platforms to use directly

Narcasserun avatar Jul 12 '21 02:07 Narcasserun

@Narcasserun Good idea. Looking forward to your design for data governance.

Al-assad avatar Jul 12 '21 03:07 Al-assad

Metadata is already under development. I divide metadata management into two parts: data source management and metadata. I came across a decision-making problem: does data source management need to be persisted to DB?

Narcasserun avatar Jul 27 '21 02:07 Narcasserun

hi @Narcasserun, i don't think persistent storage of meta data is necessary. The cost of loading Kafka, MySQL, Hive meta info on demand is very low. However, there are additional data consistency issues that need to be addressed for meta data persistent storage.

Al-assad avatar Jul 27 '21 03:07 Al-assad

Hi @Narcasserun @Al-assad , i think persistent storage of meta data is necessary. Because, it is important for the data lineage and the metadata search.

BruceWong96 avatar Aug 05 '21 09:08 BruceWong96

Any update util Aug, 2023?

zfanswer avatar Sep 01 '23 08:09 zfanswer