sling-cli icon indicating copy to clipboard operation
sling-cli copied to clipboard

Delta lake support

Open nixent opened this issue 2 years ago • 13 comments

Is there any plans to add support for Delta lake tables?

nixent avatar Oct 14 '23 20:10 nixent

Thanks for the suggestion, I'm not too familiar with Delta lake yet. Looks interesting, will add for consideration.

Links:

  • https://github.com/csimplestring/delta-go
  • https://delta.io/

flarco avatar Oct 14 '23 21:10 flarco

If you need read or write to delta, there are only a few projects that do that -> https://delta.io/integrations. Your only real options are spark or Java.

alberttwong avatar Feb 09 '24 17:02 alberttwong

Also delta-rs for Rust and Python:

https://github.com/delta-io/delta-rs

danielgafni avatar Apr 08 '24 15:04 danielgafni

My biggest issue with delta lake is that they typically only support unity catalog and no instructions to storage on s3 compared to iceberg or hudi.

alberttwong avatar Apr 08 '24 15:04 alberttwong

My biggest issue with delta lake is that they typically only support unity catalog and no instructions to storage on s3 compared to iceberg or hudi.

Hi, not sure what you mean by unity catalog, but delta lake is just an extension over parquet. As @danielgafni mentioned, there is https://github.com/delta-io/delta-rs which doesn't require you to have spark.

XBeg9 avatar Apr 08 '24 16:04 XBeg9

@alberttwong that's a databricks thing and totally unrelated to deltalake.

ion-elgreco avatar Apr 12 '24 19:04 ion-elgreco

@ion-elgreco the problem is that the top 30 committers to delta lake are databricks employees. https://tableformats.sundeck.io/. For all purposes, it's a single vendor OSS project with few commits (accepted or otherwise) from anyone else.

alberttwong avatar Apr 13 '24 01:04 alberttwong

@XBeg9 delta lake isn't enough to used by SQL query engine. Both StarRocks and trino need delta lake files to be registered in a metadata catalog like hms. Unfortunately most delta lake integrations only support unity catalog. It doesn't help that metadata catalog are the new project/vendor lock in.

alberttwong avatar Apr 13 '24 01:04 alberttwong

@XBeg9 by the way, https://github.com/delta-io/kafka-delta-ingest/issues/166 doesn't support new delta lake table creation.

alberttwong avatar Apr 13 '24 01:04 alberttwong

@ion-elgreco the problem is that the top 30 committers to delta lake are databricks employees. https://tableformats.sundeck.io/. For all purposes, it's a single vendor OSS project with few commits (accepted or otherwise) from anyone else.

Spark-delta is. Delta-rs isn't

ion-elgreco avatar Apr 13 '24 07:04 ion-elgreco

Spark-delta is. Delta-rs isn't

It's more of the delta lake core project itself. Maybe the delta lake integrations have more diversity, like you mentioned.

alberttwong avatar Apr 13 '24 16:04 alberttwong

Spark-delta is. Delta-rs isn't

It's more of the delta lake core project itself. Maybe the delta lake integrations have more diversity, like you mentioned.

Thank you for sharing your thoughts. I appreciate your insights, but I'd like to clarify that our main focus here is on Slingdata's capability to read delta tables independently of Spark. I'm particularly interested in understanding this aspect without involving HMS, Unity Catalog, or Databricks integrations at the moment. Could we perhaps steer our discussion back to that specific topic?

XBeg9 avatar Apr 13 '24 17:04 XBeg9

@flarco fyi, another implementation of delta-go

nixent avatar May 11 '24 16:05 nixent

This is done. See https://blog.slingdata.io/efficient-data-lake-management-with-sling-and-delta-lake

flarco avatar Sep 07 '24 11:09 flarco