iceberg
iceberg copied to clipboard
API:Drop column of deleted partitioned field to Unbound partitionSpec
fix #4563 issue
Test case of problem recurrence following:
- Insert data into table T without partitions
- Alter table add partition column A
- Continue inserting data into table T
- Alter table drop partition column A
- Continue inserting data into table T
- Alter table drop column A
formatVersion 1 and formatVersion 2 of table have different exceptions occurred
- version=1 table exception is
Cannot find source column for partition field: 1000: data_trunc_4: void(3)
- version=2 table exception is
Cannot find source column: 3
See the log below for details
The proposed solution is following
-
version=1 After alter drop the partition field, this field partition transform has been changed to 'void'. and after alter drop column field, this field will be deleted from the current schema, so this partition field source type don't find.
the version 1 solution is skip unpatition field check:
if (sourceType == null && field.transform().toString().equalsIgnoreCase("void")) { continue; }
-
version=2 After alter drop column field, this field will be deleted from the current schema, so in unbound partition process the deleted field don't find.
the version 2 solution is skip deleting field unbound partition:
for (UnboundPartitionField field : fields) { Types.NestedField column = schema.findField(field.sourceId); if (column != null) { if (field.partitionId != null) { builder.add(field.sourceId, field.partitionId, field.name, field.transformAsString); } else { builder.add(field.sourceId, field.name, field.transformAsString); } } }
cc: @rdblue @findepi
cc @alexjo2144
@felixYyu can you please add some information about your approach to the PR description?
updated the PR description, Please check whether it is OK? @rdblue
@felixYyu, I was asking for you to describe the problem and the proposed solution that you've implemented here. I don't see much information about the solution you propose. Could you include that please?
@rdblue The description of the solution has been updated, Could you review that again please?
cc @findepi
@alexjo2144 Could you testing with this patch, If there is still a problem, try another way.
@rdblue Could you merge this PR?
Hi @felixYyu thank you for this fix! Could you please rebase your patch and then we could take a look?
is it the same as https://github.com/apache/iceberg/pull/5707
linked issues; https://github.com/apache/iceberg/issues/5676
https://github.com/apache/iceberg/issues/5399
https://github.com/apache/iceberg/issues/4563
I think all these 3 issues are the same and 3 different persons are working on implementation (me, @Fokko, @felixYyu) 🙈
Thanks for linking these together, @ajantha-bhat ! :)
Thanks @felixYyu for the fix, and @Fokko for resolving the conflicts.
@szehon-ho @flyrain @pvary Could you please take a look at this patch? Do you see any problems with this approach?
Yea it looks reasonable. Curious if metadata tables also suffer any problem from this scenario, will need to test when I have time.
I'd be interested to take a look today too.
Tested the patch with metadata tables, seems to fix all the issues. Also nit: should we add the test to 3.3, so it lives longer (I suppose 3.3 will be used to clone further Spark versions, and if we don't add it we might lose it if nobody forward-ports it)?
@felixYyu could you please rebase and address https://github.com/apache/iceberg/pull/4602#discussion_r987088357 so that we can get this PR reviewed & merged?