iceberg API：Drop column of deleted partitioned field to Unbound partitionSpec

fix #4563 issue

Test case of problem recurrence following:

Insert data into table T without partitions
Alter table add partition column A
Continue inserting data into table T
Alter table drop partition column A
Continue inserting data into table T
Alter table drop column A

formatVersion 1 and formatVersion 2 of table have different exceptions occurred

version=1 table exception is Cannot find source column for partition field: 1000: data_trunc_4: void(3)
version=2 table exception is Cannot find source column: 3

See the log below for details

The proposed solution is following

version=1 After alter drop the partition field, this field partition transform has been changed to 'void'. and after alter drop column field, this field will be deleted from the current schema, so this partition field source type don't find.

the version 1 solution is skip unpatition field check:
```
if (sourceType == null && field.transform().toString().equalsIgnoreCase("void")) {
    continue;
}
```

version=2 After alter drop column field, this field will be deleted from the current schema, so in unbound partition process the deleted field don't find.

the version 2 solution is skip deleting field unbound partition:

for (UnboundPartitionField field : fields) {
  Types.NestedField column = schema.findField(field.sourceId);
  if (column != null) {
    if (field.partitionId != null) {
      builder.add(field.sourceId, field.partitionId, field.name, field.transformAsString);
    } else {
      builder.add(field.sourceId, field.name, field.transformAsString);
    }
  }
}

Apr 21 '22 11:04 felixYyu

cc: @rdblue @findepi

Apr 21 '22 11:04 felixYyu

cc @alexjo2144

Apr 21 '22 12:04 findepi

@felixYyu can you please add some information about your approach to the PR description?

Apr 21 '22 15:04 rdblue

updated the PR description, Please check whether it is OK? @rdblue

Apr 22 '22 01:04 felixYyu

@felixYyu, I was asking for you to describe the problem and the proposed solution that you've implemented here. I don't see much information about the solution you propose. Could you include that please?

Apr 24 '22 22:04 rdblue

@rdblue The description of the solution has been updated, Could you review that again please?

Apr 25 '22 03:04 felixYyu

cc @findepi

Apr 30 '22 23:04 felixYyu

@alexjo2144 Could you testing with this patch, If there is still a problem, try another way.

May 05 '22 07:05 felixYyu

@rdblue Could you merge this PR?

Jul 15 '22 06:07 felixYyu

Hi @felixYyu thank you for this fix! Could you please rebase your patch and then we could take a look?

Sep 22 '22 14:09 marton-bod

is it the same as https://github.com/apache/iceberg/pull/5707

linked issues; https://github.com/apache/iceberg/issues/5676

https://github.com/apache/iceberg/issues/5399

https://github.com/apache/iceberg/issues/4563

I think all these 3 issues are the same and 3 different persons are working on implementation (me, @Fokko, @felixYyu) 🙈

Sep 22 '22 15:09 ajantha-bhat

Thanks for linking these together, @ajantha-bhat ! :)

Sep 22 '22 15:09 marton-bod

Thanks @felixYyu for the fix, and @Fokko for resolving the conflicts.

@szehon-ho @flyrain @pvary Could you please take a look at this patch? Do you see any problems with this approach?

Sep 26 '22 14:09 marton-bod

Yea it looks reasonable. Curious if metadata tables also suffer any problem from this scenario, will need to test when I have time.

Sep 26 '22 23:09 szehon-ho

I'd be interested to take a look today too.

Sep 27 '22 15:09 aokolnychyi

Tested the patch with metadata tables, seems to fix all the issues. Also nit: should we add the test to 3.3, so it lives longer (I suppose 3.3 will be used to clone further Spark versions, and if we don't add it we might lose it if nobody forward-ports it)?

Sep 27 '22 22:09 szehon-ho

@felixYyu could you please rebase and address https://github.com/apache/iceberg/pull/4602#discussion_r987088357 so that we can get this PR reviewed & merged?

Jan 19 '23 11:01 nastra

iceberg iceberg copied to clipboard

API：Drop column of deleted partitioned field to Unbound partitionSpec

Test case of problem recurrence following:

formatVersion 1 and formatVersion 2 of table have different exceptions occurred

The proposed solution is following

iceberg
iceberg copied to clipboard