gravitino icon indicating copy to clipboard operation
gravitino copied to clipboard

[EPIC] Add Paimon catalog support for Gravitino

Open SteNicholas opened this issue 2 years ago • 14 comments

Describe the proposal

Gravitino supports Apache Iceberg catalog at present. Apache Paimon is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics. We could build Paimon catalog to support managing Paimon metadata.

Paimon exposes Catalog pluggable interface and supports several implementation of Catalog like FileSystemCatalog, HiveCatalog. It's recommended to build a Gravitino catalog that refers to the implementations of Paimon. Meanwhile, I would propose the RESTCatalog interface in Paimon community.

Task list

SteNicholas avatar Dec 13 '23 03:12 SteNicholas

Sounds like a good idea. We don't have Paimon expert now. Would you like to work on it? @SteNicholas :)

JunpingDu avatar Dec 13 '23 04:12 JunpingDu

@JunpingDu, I would like invite other Paimon contributors to support Paimon catalog together.

SteNicholas avatar Dec 13 '23 05:12 SteNicholas

@JunpingDu, I would like invite other Paimon contributors to support Paimon catalog together.

@SteNicholas We are very interested it and waiting for the proposal and milestones to dismantle, look forward to achieve paimon catalog together, thx.

YxAc avatar Feb 09 '24 16:02 YxAc

@YxAc Can I ask you to take a little more care with your words. I'm sure no ill intent was intended, but It is often hard to read the tone in messages, and the way that was written could be taken the wrong way. Also, people are volunteers here; sometimes, things may take longer than they first intended.

justinmclean avatar Feb 10 '24 08:02 justinmclean

@YxAc Can I ask you to take a little more care with your words. I'm sure no ill intent was intended, but It is often hard to read the tone in messages, and the way that was written could be taken the wrong way. Also, people are volunteers here; sometimes, things may take longer than they first intended.

@justinmclean Sure, thanks for your remind, I will put it in another way.

Actually, we knew each other and talk about Paimon catalog offline, my words above was just a little joke. This is indeed easy to lead to misunderstanding. I will pay attention to it.

Thank you for reminding me.

YxAc avatar Feb 10 '24 11:02 YxAc

Another reminder: as we are an open-source project, it is best if all communication is public; that way, all contributors can participate. Please try to have conversations about this feature in public.

justinmclean avatar Feb 10 '24 11:02 justinmclean

Another reminder: as we are an open-source project, it is best if all communication is public; that way, all contributors can participate. Please try to have conversations about this feature in public.

Sure

YxAc avatar Feb 10 '24 13:02 YxAc

@SteNicholas Hi, I did some investigation on Paimon. I found that Paimon does not need HMS to store a metadata.json like Iceberg. The most important thing is we need an implementation of Lock. For now, I think we can use another method to implement the lock not in gravitino. Then we can put this work forward more fast.

We can use Gravitino to manage the Paimon and store the metadata of the database、table. And we may not need a REST catalog like Iceberg. We can just use Gravitino. That makes things more simple.

What do you think?

coolderli avatar Mar 01 '24 03:03 coolderli

@coolderli, the implementation of the lock is not designed in Gravitino. A Paimon REST catalog (better have) can facilitate users to use catalog through Rest method, which operation does not have conflict.

SteNicholas avatar Mar 01 '24 03:03 SteNicholas

@coolderli, the implementation of the lock is not designed in Gravitino. A Paimon REST catalog (better have) can facilitate users to use catalog through Rest method, which operation does not have conflict.

@SteNicholas Yeah, I know what you mean. But Gravitino already has its own Open API. We can use Gravitino Open API to do the same work. Of course, a Paimon REST catalog is meaningful, there is indeed no conflict between the two implementation methods. But using Gravitino Open API is more simple for now. We can finish this work more fast.

coolderli avatar Mar 01 '24 04:03 coolderli

@YxAc, @coolderli, I have updated the proposal of Paimon catalog support. PTAL.

SteNicholas avatar Mar 06 '24 02:03 SteNicholas

@SteNicholas Hi, any update about this? Thanks.

coolderli avatar Mar 26 '24 11:03 coolderli

@caican00 can you please leave a message here, so I can assign the epic issue to you.

jerryshao avatar Aug 01 '24 11:08 jerryshao

@caican00 can you please leave a message here, so I can assign the epic issue to you.

@jerryshao sorry for the late. I have completed the db and table operations based on Paimon FilesystemCatalog.

caican00 avatar Aug 08 '24 06:08 caican00