iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

API:Drop column of deleted partitioned field to Unbound partitionSpec

Open felixYyu opened this issue 2 years ago • 12 comments

fix #4563 issue

Test case of problem recurrence following:

  1. Insert data into table T without partitions
  2. Alter table add partition column A
  3. Continue inserting data into table T
  4. Alter table drop partition column A
  5. Continue inserting data into table T
  6. Alter table drop column A

formatVersion 1 and formatVersion 2 of table have different exceptions occurred

  • version=1 table exception is Cannot find source column for partition field: 1000: data_trunc_4: void(3)
  • version=2 table exception is Cannot find source column: 3

See the log below for details

The proposed solution is following

  • version=1 After alter drop the partition field, this field partition transform has been changed to 'void'. and after alter drop column field, this field will be deleted from the current schema, so this partition field source type don't find.

    the version 1 solution is skip unpatition field check:

    if (sourceType == null && field.transform().toString().equalsIgnoreCase("void")) {
        continue;
    }
    
  • version=2 After alter drop column field, this field will be deleted from the current schema, so in unbound partition process the deleted field don't find.

    the version 2 solution is skip deleting field unbound partition:

    for (UnboundPartitionField field : fields) {
      Types.NestedField column = schema.findField(field.sourceId);
      if (column != null) {
        if (field.partitionId != null) {
          builder.add(field.sourceId, field.partitionId, field.name, field.transformAsString);
        } else {
          builder.add(field.sourceId, field.name, field.transformAsString);
        }
      }
    }
    

felixYyu avatar Apr 21 '22 11:04 felixYyu

cc: @rdblue @findepi

felixYyu avatar Apr 21 '22 11:04 felixYyu

cc @alexjo2144

findepi avatar Apr 21 '22 12:04 findepi

@felixYyu can you please add some information about your approach to the PR description?

rdblue avatar Apr 21 '22 15:04 rdblue

updated the PR description, Please check whether it is OK? @rdblue

felixYyu avatar Apr 22 '22 01:04 felixYyu

@felixYyu, I was asking for you to describe the problem and the proposed solution that you've implemented here. I don't see much information about the solution you propose. Could you include that please?

rdblue avatar Apr 24 '22 22:04 rdblue

@rdblue The description of the solution has been updated, Could you review that again please?

felixYyu avatar Apr 25 '22 03:04 felixYyu

cc @findepi

felixYyu avatar Apr 30 '22 23:04 felixYyu

@alexjo2144 Could you testing with this patch, If there is still a problem, try another way.

felixYyu avatar May 05 '22 07:05 felixYyu

@rdblue Could you merge this PR?

felixYyu avatar Jul 15 '22 06:07 felixYyu

Hi @felixYyu thank you for this fix! Could you please rebase your patch and then we could take a look?

marton-bod avatar Sep 22 '22 14:09 marton-bod

is it the same as https://github.com/apache/iceberg/pull/5707

linked issues; https://github.com/apache/iceberg/issues/5676

https://github.com/apache/iceberg/issues/5399

https://github.com/apache/iceberg/issues/4563

I think all these 3 issues are the same and 3 different persons are working on implementation (me, @Fokko, @felixYyu) 🙈

ajantha-bhat avatar Sep 22 '22 15:09 ajantha-bhat

Thanks for linking these together, @ajantha-bhat ! :)

marton-bod avatar Sep 22 '22 15:09 marton-bod

Thanks @felixYyu for the fix, and @Fokko for resolving the conflicts.

@szehon-ho @flyrain @pvary Could you please take a look at this patch? Do you see any problems with this approach?

marton-bod avatar Sep 26 '22 14:09 marton-bod

Yea it looks reasonable. Curious if metadata tables also suffer any problem from this scenario, will need to test when I have time.

szehon-ho avatar Sep 26 '22 23:09 szehon-ho

I'd be interested to take a look today too.

aokolnychyi avatar Sep 27 '22 15:09 aokolnychyi

Tested the patch with metadata tables, seems to fix all the issues. Also nit: should we add the test to 3.3, so it lives longer (I suppose 3.3 will be used to clone further Spark versions, and if we don't add it we might lose it if nobody forward-ports it)?

szehon-ho avatar Sep 27 '22 22:09 szehon-ho

@felixYyu could you please rebase and address https://github.com/apache/iceberg/pull/4602#discussion_r987088357 so that we can get this PR reviewed & merged?

nastra avatar Jan 19 '23 11:01 nastra