delta
delta copied to clipboard
Apache Druid - Delta Connector
Feature request
Hello.
presto has recently created a delta.io adapter to link delta files directly, this is making me consider to use a presto cluster instead (would scale decently enough with current data load I have to work with). However, is there anyone who has faced this issue and solved it in a “nice” way, or are there any plans to add a delta connector similar to what presto has done?
Overview
Here is a reference to what presto has added: https://prestodb.io/blog/2022/03/15/native-delta-lake-connector-for-presto
Motivation
To utilise apache druid for centralising a lakehouse architecture with delta.io as the base data layer.
Further details
And a video: https://www.youtube.com/watch?v=JrXGkqpl7xk (fast forward to 21:40
) this is what would be nice to have but within apache druid.
Willingness to contribute
The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?
- [ ] Yes. I can contribute this feature independently.
- [X] Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
- [ ] No. I cannot contribute this feature at this time.
Oh, this is great to hear @Thelin90 - would it help if we found some time to chat on the various design considerations? Are any other folks interested in helping out? Thanks!
Bumping this request. Support for ingesting from Delta Lake and/or querying external Delta tables in Druid would be extremely useful. Right now my workaround is to manually parse out file paths from the Delta manifest and submit those in an ingestion spec, which isn't ideal.
I'm happy to contribute to this as well.