delta icon indicating copy to clipboard operation
delta copied to clipboard

Apache Druid - Delta Connector

Open Thelin90 opened this issue 2 years ago • 1 comments

Feature request

Hello.

presto has recently created a delta.io adapter to link delta files directly, this is making me consider to use a presto cluster instead (would scale decently enough with current data load I have to work with). However, is there anyone who has faced this issue and solved it in a “nice” way, or are there any plans to add a delta connector similar to what presto has done?

Overview

Here is a reference to what presto has added: https://prestodb.io/blog/2022/03/15/native-delta-lake-connector-for-presto

Motivation

To utilise apache druid for centralising a lakehouse architecture with delta.io as the base data layer.

Further details

And a video: https://www.youtube.com/watch?v=JrXGkqpl7xk (fast forward to 21:40) this is what would be nice to have but within apache druid.

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

  • [ ] Yes. I can contribute this feature independently.
  • [X] Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
  • [ ] No. I cannot contribute this feature at this time.

Thelin90 avatar Jun 02 '22 13:06 Thelin90

Oh, this is great to hear @Thelin90 - would it help if we found some time to chat on the various design considerations? Are any other folks interested in helping out? Thanks!

dennyglee avatar Jun 07 '22 00:06 dennyglee

Bumping this request. Support for ingesting from Delta Lake and/or querying external Delta tables in Druid would be extremely useful. Right now my workaround is to manually parse out file paths from the Delta manifest and submit those in an ingestion spec, which isn't ideal.

I'm happy to contribute to this as well.

jaylynstoesz avatar Mar 25 '23 18:03 jaylynstoesz