delta icon indicating copy to clipboard operation
delta copied to clipboard

Add support for GENERATED ALWAYS AS IDENTITY in DeltaTableBuilder

Open norbitek opened this issue 3 years ago • 14 comments

Last version of Databricks added support for identity column in Delta table. It is possible to define GENERATED ALWAYS AS IDENTITY in column specification.

It would be nice to do the same using DeltaTableBuilder for example:

DeltaTable.create(spark)
.tableName("default.people10m")
.addColumn("id", "BIGINT", generatedAlwaysAs="IDENTITY(START WITH 10 INCREMENT BY 10)")
.addColumn("firstName", "STRING")
.addColumn("middleName", "STRING")
.addColumn("lastName", "STRING", comment = "surname")
.addColumn("gender", "STRING")
.addColumn("birthDate", "TIMESTAMP")
.addColumn("dateOfBirth", DateType(), generatedAlwaysAs="CAST(birthDate AS DATE)")
.addColumn("ssn", "STRING")
.addColumn("salary", "INT")
.partitionedBy("gender")
.execute()

norbitek avatar Apr 15 '22 07:04 norbitek

Hi @norbitek thanks for opening this issue. This is definitely in the plan for Delta Lake but we're currently prioritizing other features on the roadmap #920 like OPTIMIZE ZORDER and CDF.

allisonport-db avatar Apr 15 '22 20:04 allisonport-db

@norbitek, it's on the roadmap for 2022 H2 🥳 https://github.com/delta-io/delta/issues/1307

keen85 avatar Aug 12 '22 08:08 keen85

Tried to add a generated column using SQL. So I understand it is not supported yet in pyspark?

generated

wedesoft avatar Sep 30 '22 15:09 wedesoft

@wedesoft Spark doesn't support it yet. The sql syntax supported for GENERATED COLUMN is tracked by #1100

zsxwing avatar Sep 30 '22 15:09 zsxwing

Is this still on the roadmap?

jasperp97 avatar May 15 '23 16:05 jasperp97

Any news on this issue status?

thebaz73 avatar Oct 10 '23 13:10 thebaz73

Any update on release date ?

shahkalpan07 avatar Nov 05 '23 04:11 shahkalpan07

This is definitely still on the roadmap! However, at the moment all the focus is on completing Deletion Vectors, which is in high demand. We will only get to this item after that work is complete.

bart-samwel avatar Nov 10 '23 14:11 bart-samwel

Since Delta Lake 3.1.0 (with deletion vectors) is out now, would you consider working on it for 3.2, @bart-samwel 😇

keen85 avatar Feb 07 '24 16:02 keen85

@keen85

Since Delta Lake 3.1.0 (with deletion vectors) is out now, would you consider working on it for 3.2

Thank you for the reminder! It is near the top of our list now. I can't make any hard guarantees, but I'm hopeful that we'll get to this pretty soon.

bart-samwel avatar Feb 08 '24 09:02 bart-samwel

@bart-samwel What is the reason that features in Standalone version are implemented with such big latency? Does it means that for every new features (like for example liquid clustering) we will wait for about 2 years?

norbitek avatar Feb 08 '24 09:02 norbitek

@norbitek

What is the reason that features in Standalone version are implemented with such big latency?

Just to make sure there's no confusion here: Delta Standalone is different from the Spark connector for of Delta Lake. Standalone is a library that can be used to implement connectors for non-Spark systems, and it is not really getting the new features anymore -- its design is not really suitable to support many of the new features easily. All of the new efforts are going into Delta Kernel, which is the new library for building connectors. It makes it a lot easier to keep up with new features, and we intend to keep it up to date.

Identity columns is a feature where we have unfortunately dropped the ball even for support in the Spark connector. It's the exception though, not the rule!

Does it means that for every new features (like for example liquid clustering) we will wait for about 2 years?

Certainly not! Like I said, identity columns is an exception. Liquid clustering is actually released in Delta Lake 3.1 which came out last week! https://github.com/delta-io/delta/releases

bart-samwel avatar Feb 08 '24 10:02 bart-samwel

Hi, currently in my company, I'm not using Spark SQL anywhere. Here I wanted to utilize DeltaTableBuilderAPI. So wanted to ask whether is this resolved, if no, when will we get this update?

Many thanks, Yogesh S

SYOGESH045 avatar May 26 '24 14:05 SYOGESH045

@SYOGESH045 The next release of Delta is going to be Delta 3.3. The identity column support seems to be in progress - https://github.com/delta-io/delta/pull/3044. So Delta 3.3 should have it. If I have to hazard a guess, Delta 3.3 should be released in 2-3 months.

tdas avatar May 30 '24 12:05 tdas