smart-data-lake icon indicating copy to clipboard operation
smart-data-lake copied to clipboard

Iceberg 1.x does not support merge with schema change/evolution

Open zzeekk opened this issue 1 year ago • 0 comments

Describe the bug With Iceberg 1.4 a merge statement with previous schema change / or included schema evolution fails with the following error message:

[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `new`.`tpe` cannot be resolved. Did you mean one of the following? [`new`.`tpe`, `new`.`lastname`, `new`.`rating2`, `new`.`firstname`, `existing`.`tpe`].; line 4 pos 4;
'MergeIntoTable ((('new.tpe = 'existing.tpe) AND ('new.lastname = 'existing.lastname)) AND ('new.firstname = 'existing.firstname)), [updateaction(None, assignment('existing.rating2, 'new.rating2))], [insertaction(None, assignment('tpe, 'new.tpe), assignment('lastname, 'new.lastname), assignment('firstname, 'new.firstname), assignment('rating, null), assignment('rating2, 'new.rating2))]
:- SubqueryAlias existing
:  +- SubqueryAlias iceberg1.default.test_merge
:     +- RelationV2[tpe#168, lastname#169, firstname#170, rating#171, rating2#172] iceberg1.default.test_merge iceberg1.default.test_merge
+- SubqueryAlias new
   +- Project [tpe#173, lastname#174, firstname#175, rating2#176]
      +- SubqueryAlias iceberg1.default.test_merge_sdltmp
         +- RelationV2[tpe#173, lastname#174, firstname#175, rating2#176] iceberg1.default.test_merge_sdltmp iceberg1.default.test_merge_sdltmp

The SQL Statement executed is

 MERGE INTO iceberg1.default.test_merge as existing
 USING (SELECT * from iceberg1.default.test_merge_sdltmp) as new
 ON new.tpe = existing.tpe AND new.lastname = existing.lastname AND new.firstname = existing.firstname  
 WHEN MATCHED  THEN UPDATE SET existing.rating2 = new.rating2
 WHEN NOT MATCHED  THEN INSERT (tpe, lastname, firstname, rating, rating2) VALUES (new.tpe, new.lastname, new.firstname, null, new.rating2)

To Reproduce See IcebergTableDataObjectTest test case "SaveMode merge with schema evolution"

Expected behavior Currently the test case intercepts the 'AnalysisException'. It should succeed without intercepting this exception.

Additional context Some debugging showed that mergeCondition of org.apache.spark.sql.catalyst.plans.logical.MergeIntoTable is not resolved by Spark when processing the plan. We should create an issue in Iceberg project with a "minimal example".

zzeekk avatar Feb 01 '24 09:02 zzeekk