delta [BUG] INSERT INTO with generated columns fails (not enough data columns)

When executing sql("INSERT INTO delta_gencols VALUES 1") Delta Lake 1.2.1 fails with the following AnalysisException:

org.apache.spark.sql.AnalysisException: Cannot write to 'default.delta_gencols', not enough data columns; target table has 2 column(s) but the inserted data has 1 column(s)
  at org.apache.spark.sql.delta.DeltaErrors$.notEnoughColumnsInInsert(DeltaErrors.scala:356)
  at org.apache.spark.sql.delta.DeltaAnalysis.org$apache$spark$sql$delta$DeltaAnalysis$$needsSchemaAdjustment(DeltaAnalysis.scala:322)
  at org.apache.spark.sql.delta.DeltaAnalysis$$anonfun$apply$1.applyOrElse(DeltaAnalysis.scala:67)
  at org.apache.spark.sql.delta.DeltaAnalysis$$anonfun$apply$1.applyOrElse(DeltaAnalysis.scala:64)

The table was created as follows:

import io.delta.tables.DeltaTable
import org.apache.spark.sql.types.DataTypes

val tableName = "delta_gencols"
sql(s"DROP TABLE IF EXISTS $tableName")
DeltaTable.create
  .addColumn("id", DataTypes.LongType, nullable = false)
  .addColumn(
    DeltaTable.columnBuilder("value")
      .dataType(DataTypes.BooleanType)
      .generatedAlwaysAs("true")
      .build)
  .tableName(tableName)
  .execute

I found this conversation with @zsxwing in delta-users group, but that does not seem to apply (unless I'm mistaken):

The INSERT INTO issue and CREATE TABLE syntax for generated columns are not available in OSS Delta Lake right now because Apache Spark SQL parser doesn't support them.

Spark SQL API works just fine:

spark.range(5).writeTo(tableName).append()

Environment information

Delta Lake version: 1.2.1
Spark version: 3.2.1
Scala version: 2.12.15

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.2.1
      /_/

Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 11.0.14)

Jun 17 '22 11:06 jaceklaskowski

I'm getting this with a GENERATED IDENTITY column as well.

Jun 28 '22 16:06 slyons

For the query INSERT INTO delta_gencols VALUES 1 (delta_gencols has the id and value column as shown in the issue description), we don't support it as it's not a SQL standard. The following two SQL syntax are the standard:

INSERT INTO delta_gencols VALUES(1, DEFAULT). Spark is adding the DEFAULT keyword support and we will work with them to make it work with generated columns as well ( https://issues.apache.org/jira/browse/SPARK-38334 )
INSERT INTO delta_gencols(id) VALUES(1). Currently Spark will block such INSERT query if the user doesn't provide the entire column list. We are working on this and hope to remove the restriction so that we can make it work with generated columns in Delta.

Jun 29 '22 04:06 zsxwing

I'm getting this problem with just a regular insertInto in PySpark. Is this the same thing @zsxwing ?

Jun 29 '22 15:06 slyons

@slyons Yep. That's the same issue.

Jul 01 '22 05:07 zsxwing

insert into target_table (select <all cols except auto-generated one> from source_table)

also fails

Nov 28 '22 23:11 justinTM

insert into target_table (select from source_table)

This is the issue I mentioned above Currently Spark will block such INSERT query if the user doesn't provide the entire column list. We are working on this and hope to remove the restriction so that we can make it work with generated columns in Delta.

Nov 28 '22 23:11 zsxwing

@zsxwing is this a Delta or a Spark issue? If it is a Spark issue, is there already a tracking issue? I did not find any: https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20text%20~%20"insert%20not%20enough%20data%20columns"

Nov 30 '22 09:11 keen85

@keen85 SPARK-41290 is the spark side ticket. We will make sure insert into works with Generated Columns when finishing the change in Spark side.

Dec 01 '22 19:12 zsxwing

delta delta copied to clipboard

[BUG] INSERT INTO with generated columns fails (not enough data columns)

Environment information

delta
delta copied to clipboard