delta icon indicating copy to clipboard operation
delta copied to clipboard

[Feature Request] Delta Sync for metadata sync to HMS/Glue

Open agrawalreetika opened this issue 2 years ago • 12 comments

Hudi has Hudi Sync which allows sync of table metadata from transaciton logs to HMS/Glue. I wanted to know if there is something similar for Delta Tables?

agrawalreetika avatar Nov 10 '22 04:11 agrawalreetika

This will be a good feature to add. However, from my past experience, HMS interactions are often flaky and buggy. Do you know how well the Hudi sync works?

tdas avatar Nov 10 '22 13:11 tdas

@tdas Thanks for your response. I have tried Hudi Sync with Glue, haven't seen any issue so far

agrawalreetika avatar Nov 10 '22 14:11 agrawalreetika

Hi @tdas, Could you please help me understand what kind of flakiness and bugs you faced earlier with HMS sync? Is there any work done already on the sync of table metadata from transaction logs to HMS/Glue, which I could follow?

agrawalreetika avatar Nov 15 '22 16:11 agrawalreetika

Out of curiosity, would Glue Crawler reading Delta tables work in this scenario, or would you need to go beyond that? Shameless plug of a recent session by @moomindami and myself on this topic btw https://www.youtube.com/watch?v=GrqjZoVokNQ

dennyglee avatar Nov 15 '22 19:11 dennyglee

Sorry for the delay. Thanks, @danny for sharing the Video link. I checked the Glue crawler which is reading the metadata from transaction logs and updating it to glue. But it is creating symlink tables, looks like it is not configurable while configuring the crawler. And I do not find any specific properties in the metadata to identify if it is a Delta Table.

Please correct me if I am missing something here.

agrawalreetika avatar Nov 22 '22 07:11 agrawalreetika

Oh, sorry, I had jumped too quickly ;-). Could you try AWS Glue 4.0 with Delta Lake 2.1?

dennyglee avatar Dec 03 '22 18:12 dennyglee

@dennyglee I am using Glue Crawler for updating and maintaining metadata for the Delta table in the Glue catalog. As per the given document, it looks like it's for Glue jobs for data read/writte in Delta tables?

agrawalreetika avatar Dec 05 '22 19:12 agrawalreetika

@tdas @dennyglee What should be the next steps to get this feature, as I didn't find any option to do metadata sync to glue/HMS?

agrawalreetika avatar Dec 14 '22 19:12 agrawalreetika

@agrawalreetika Thanks for your patience - some quick questions:

  • Does the Glue catalog contain the information from the Glue crawler? Or is this lacking necessary metadata?
  • Would a post-commit web hook to load Glue, HMS, etc. be a potential solution?

dennyglee avatar Dec 19 '22 21:12 dennyglee

Hi @dennyglee, Thanks for your response.

  • Yes, Glue catalog stores the metadata via Glue Crawler from Delta table path but it creates Symlink table. There is no option in Glue Crawler to avoid that.
  • I am not sure about post-commit web hook, but I think if glue crawler could have configurable option for table type (metadata) it could be good. Though this is specific to glue. I was looking for some option similar to Hudi Sync tool which could be used to sync metadata from transaction logs to HMS/Glue. Please let me know if you have any other questions.

agrawalreetika avatar Dec 23 '22 15:12 agrawalreetika

Hi @dennyglee, Just checking in do you need any other details from my side?

agrawalreetika avatar Jan 10 '23 03:01 agrawalreetika

Hi all, I have started working on this issue.

dhruvarya-db avatar Nov 14 '23 18:11 dhruvarya-db

I am looking forward to the completion of this feature

fuyun2024 avatar Dec 28 '23 06:12 fuyun2024

This is resolved in #2409

vkorukanti avatar Jan 30 '24 17:01 vkorukanti