delta
delta copied to clipboard
[Feature Request] Delta Sync for metadata sync to HMS/Glue
Hudi has Hudi Sync which allows sync of table metadata from transaciton logs to HMS/Glue. I wanted to know if there is something similar for Delta Tables?
This will be a good feature to add. However, from my past experience, HMS interactions are often flaky and buggy. Do you know how well the Hudi sync works?
@tdas Thanks for your response. I have tried Hudi Sync with Glue, haven't seen any issue so far
Hi @tdas, Could you please help me understand what kind of flakiness and bugs you faced earlier with HMS sync? Is there any work done already on the sync of table metadata from transaction logs to HMS/Glue, which I could follow?
Out of curiosity, would Glue Crawler reading Delta tables work in this scenario, or would you need to go beyond that? Shameless plug of a recent session by @moomindami and myself on this topic btw https://www.youtube.com/watch?v=GrqjZoVokNQ
Sorry for the delay. Thanks, @danny for sharing the Video link. I checked the Glue crawler which is reading the metadata from transaction logs and updating it to glue. But it is creating symlink tables, looks like it is not configurable while configuring the crawler. And I do not find any specific properties in the metadata to identify if it is a Delta Table.
Please correct me if I am missing something here.
Oh, sorry, I had jumped too quickly ;-). Could you try AWS Glue 4.0 with Delta Lake 2.1?
@dennyglee I am using Glue Crawler for updating and maintaining metadata for the Delta table in the Glue catalog. As per the given document, it looks like it's for Glue jobs for data read/writte in Delta tables?
@tdas @dennyglee What should be the next steps to get this feature, as I didn't find any option to do metadata sync to glue/HMS?
@agrawalreetika Thanks for your patience - some quick questions:
- Does the Glue catalog contain the information from the Glue crawler? Or is this lacking necessary metadata?
- Would a post-commit web hook to load Glue, HMS, etc. be a potential solution?
Hi @dennyglee, Thanks for your response.
- Yes, Glue catalog stores the metadata via Glue Crawler from Delta table path but it creates Symlink table. There is no option in Glue Crawler to avoid that.
- I am not sure about post-commit web hook, but I think if glue crawler could have configurable option for table type (metadata) it could be good. Though this is specific to glue. I was looking for some option similar to Hudi Sync tool which could be used to sync metadata from transaction logs to HMS/Glue. Please let me know if you have any other questions.
Hi @dennyglee, Just checking in do you need any other details from my side?
Hi all, I have started working on this issue.
I am looking forward to the completion of this feature
This is resolved in #2409