gravitino icon indicating copy to clipboard operation
gravitino copied to clipboard

[Improvement] Can Gravitino replace HMS?

Open melin opened this issue 8 months ago • 3 comments

Gravitino creates the catalog by relying on an external metastore storage system. Unlike unitycatalog, creating a catalog does not depend on other Metastores. For example, spark + hms, replaced with spark + Gravitino, create a parquet format table:

create table demos ( id long, name string ) using parquet

spark+Flink and Gravitino are used to store metadata, instead of relying on other table metadata such as hms, glue catalog, aliyun DLF, and jdbc.

melin avatar Apr 09 '25 11:04 melin

cc @mchades

melin avatar Apr 09 '25 11:04 melin

Perhaps we can first add a generic catalog in the relational catalog to replace HMS. This generic catalog will persist metadata to Gravitino's backend storage, such as MySQL or PostgreSQL. Then, implement GravitinoGenericCatalog in Spark connector and Flink connector to store arbitrary table structure information. For example, Flink can store Kafka tables, filesystem tables, or other tables for which Flink does not natively provide a catalog.

hdygxsj avatar Apr 09 '25 13:04 hdygxsj

Perhaps we can first add a generic catalog in the relational catalog to replace HMS. This generic catalog will persist metadata to Gravitino's backend storage, such as MySQL or PostgreSQL. Then, implement GravitinoGenericCatalog in Spark connector and Flink connector to store arbitrary table structure information. For example, Flink can store Kafka tables, filesystem tables, or other tables for which Flink does not natively provide a catalog.

Yes, the default catalog store metadata. If cannot replace HMS, Glue Catalog, etc. The use scenario of Gravitino is compromised and the maintenance cost is increased.

melin avatar Apr 10 '25 01:04 melin

unitycatalog 自己把table metadata存起来,不依赖外界metastore,Gravitino 使用场景会更广。希望尽快支持此功能。

我们是做数据中台厂商的,不同客户用会 aliyun dlf, glue catalog, hms, google metastore等产品。希望有一个中立的,有影响力的产品替代这些 metastore 产品。而不是在这些 metastore 基础上,再套一层gravitino。这样对我们产品接入和运维太复杂了。

https://github.com/unitycatalog/unitycatalog/blob/b141f19665c0c88ed6b72a04149881690ac29dcd/server/src/main/java/io/unitycatalog/server/persist/TableRepository.java#L108

@yuqi1129 @mchades @FANNG1

melin avatar Jun 19 '25 11:06 melin

So, can Gravitino replace HMS now? The metadata is maintained by Gravitino oneself.

carl239 avatar Jun 20 '25 11:06 carl239

So, can Gravitino replace HMS now? The metadata is maintained by Gravitino oneself.

现在不可以

melin avatar Jun 20 '25 11:06 melin

unitycatalog 自己把table metadata存起来,不依赖外界metastore,Gravitino 使用场景会更广。希望尽快支持此功能。

我们是做数据中台厂商的,不同客户用会 aliyun dlf, glue catalog, hms, google metastore等产品。希望有一个中立的,有影响力的产品替代这些 metastore 产品。而不是在这些 metastore 基础上,再套一层gravitino。这样对我们产品接入和运维太复杂了。

Hi, @melin just a friendly reminder that according to the Apache CoC, all public discussions in Apache projects should be conducted in English. This helps ensure everyone in the community can participate and understand. Thank you for your understanding.

mchades avatar Jun 20 '25 15:06 mchades