smart-data-lake
smart-data-lake copied to clipboard
Iceberg 1.x does not support merge with schema change/evolution
Describe the bug With Iceberg 1.4 a merge statement with previous schema change / or included schema evolution fails with the following error message:
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `new`.`tpe` cannot be resolved. Did you mean one of the following? [`new`.`tpe`, `new`.`lastname`, `new`.`rating2`, `new`.`firstname`, `existing`.`tpe`].; line 4 pos 4;
'MergeIntoTable ((('new.tpe = 'existing.tpe) AND ('new.lastname = 'existing.lastname)) AND ('new.firstname = 'existing.firstname)), [updateaction(None, assignment('existing.rating2, 'new.rating2))], [insertaction(None, assignment('tpe, 'new.tpe), assignment('lastname, 'new.lastname), assignment('firstname, 'new.firstname), assignment('rating, null), assignment('rating2, 'new.rating2))]
:- SubqueryAlias existing
: +- SubqueryAlias iceberg1.default.test_merge
: +- RelationV2[tpe#168, lastname#169, firstname#170, rating#171, rating2#172] iceberg1.default.test_merge iceberg1.default.test_merge
+- SubqueryAlias new
+- Project [tpe#173, lastname#174, firstname#175, rating2#176]
+- SubqueryAlias iceberg1.default.test_merge_sdltmp
+- RelationV2[tpe#173, lastname#174, firstname#175, rating2#176] iceberg1.default.test_merge_sdltmp iceberg1.default.test_merge_sdltmp
The SQL Statement executed is
MERGE INTO iceberg1.default.test_merge as existing
USING (SELECT * from iceberg1.default.test_merge_sdltmp) as new
ON new.tpe = existing.tpe AND new.lastname = existing.lastname AND new.firstname = existing.firstname
WHEN MATCHED THEN UPDATE SET existing.rating2 = new.rating2
WHEN NOT MATCHED THEN INSERT (tpe, lastname, firstname, rating, rating2) VALUES (new.tpe, new.lastname, new.firstname, null, new.rating2)
To Reproduce See IcebergTableDataObjectTest test case "SaveMode merge with schema evolution"
Expected behavior Currently the test case intercepts the 'AnalysisException'. It should succeed without intercepting this exception.
Additional context Some debugging showed that mergeCondition of org.apache.spark.sql.catalyst.plans.logical.MergeIntoTable is not resolved by Spark when processing the plan. We should create an issue in Iceberg project with a "minimal example".