iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Error happened after deleting a partitioned column

Open lvyanquan opened this issue 2 years ago • 5 comments

error message:

 Caused by: java.lang.NullPointerException: Cannot find source column: 3
	at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:953) ~[iceberg-bundled-guava-0.13.2.jar:na]
	at org.apache.iceberg.PartitionSpec$Builder.add(PartitionSpec.java:503) ~[iceberg-api-0.13.2.jar:na]
	at org.apache.iceberg.PartitionSpecParser.buildFromJsonFields(PartitionSpecParser.java:155) ~[iceberg-core-0.13.2.jar:na]
	at org.apache.iceberg.PartitionSpecParser.fromJson(PartitionSpecParser.java:78) ~[iceberg-core-0.13.2.jar:na]
	at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:357) ~[iceberg-core-0.13.2.jar:na]
	at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:288) ~[iceberg-core-0.13.2.jar:na]

json of metadata file contains information of schemas/partition-specs/sort-orders. But there is no link between schemas and partition-specs, thus deleting a partitioned column will raise error while building history partition-specs, because source-id could not be found in current schema. I think that schema-id should be add to json of partition-specs. part of metadata file:

    "last-column-id":3,
    "current-schema-id":1,
    "schemas":[
        {
            "type":"struct",
            "schema-id":0,
            "fields":[
                {
                    "id":1,
                    "name":"name1",
                    "required":false,
                    "type":"string"
                },
                {
                    "id":2,
                    "name":"name2",
                    "required":false,
                    "type":"string"
                },
                {
                    "id":3,
                    "name":"name3",
                    "required":false,
                    "type":"string"
                }
            ]
        },
        {
            "type":"struct",
            "schema-id":1,
            "fields":[
                {
                    "id":1,
                    "name":"name1",
                    "required":false,
                    "type":"string"
                },
                {
                    "id":2,
                    "name":"name2",
                    "required":false,
                    "type":"string"
                }
            ]
        }
    ],
    "default-spec-id":1,
    "partition-specs":[
        {
            "spec-id":0,
            "fields":[
                {
                    "name":"name3",
                    "transform":"identity",
                    "source-id":3,
                    "field-id":1000
                }
            ]
        },
        {
            "spec-id":1,
            "fields":[

            ]
        }
    ],
    "last-partition-id":1000

lvyanquan avatar Jul 31 '22 14:07 lvyanquan

@lvyanquan:

Do you have a testcase or sample SQL to reproduce this? As I want to know when do we build the history partition-specs

ajantha-bhat avatar Aug 01 '22 15:08 ajantha-bhat

@lvyanquan:

Do you have a testcase or sample SQL to reproduce this? As I want to know when do we build the history partition-specs

we can reproduce this error using the following sql (spark3.2, iceberg0.13 or 0.14), prod is the name of catalog:

CREATE TABLE prod.db.sample (id bigint, data string, category string) USING iceberg PARTITIONED BY (category) TBLPROPERTIES('format-version' = '2');

ALTER TABLE prod.db.sample DROP PARTITION FIELD category;

ALTER TABLE prod.db.sample DROP COLUMN category; Even though I deleted this column using JAVA API, I met NullPointerException when using this table.

lvyanquan avatar Aug 02 '22 02:08 lvyanquan

Update: Just a different exception in the latest code. But the problem still exist

Cannot find source column for partition field: 1000: category: identity(3)
org.apache.iceberg.exceptions.ValidationException: Cannot find source column for partition field: 1000: category: identity(3)
	at org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:49)
	at org.apache.iceberg.PartitionSpec.checkCompatibility(PartitionSpec.java:558)
	at org.apache.iceberg.PartitionSpec$Builder.build(PartitionSpec.java:546)
	at org.apache.iceberg.UnboundPartitionSpec.bind(UnboundPartitionSpec.java:45)
	at org.apache.iceberg.PartitionSpecParser.fromJson(PartitionSpecParser.java:85)
	at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:390)
	at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:311)
	at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:274)
	at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:267)
	at org.apache.iceberg.hadoop.HadoopTableOperations.updateVersionAndMetadata(HadoopTableOperations.java:98)
	at org.apache.iceberg.hadoop.HadoopTableOperations.refresh(HadoopTableOperations.java:121)
	at org.apache.iceberg.hadoop.HadoopTableOperations.current(HadoopTableOperations.java:84)
	at org.apache.iceberg.BaseTable.properties(BaseTable.java:119)
	at org.apache.iceberg.spark.source.SparkTable.<init>(SparkTable.java:128)
	at org.apache.iceberg.spark.source.SparkTable.<init>(SparkTable.java:118)
	at org.apache.iceberg.spark.SparkCatalog.alterTable(SparkCatalog.java:290)

ajantha-bhat avatar Sep 21 '22 05:09 ajantha-bhat

just fyi that we're tracking the same issue in https://github.com/apache/iceberg/issues/5676

nastra avatar Sep 21 '22 10:09 nastra

Hey all, I have a PR ready: https://github.com/apache/iceberg/pull/5707 This doesn't lookup the historical columns anymore.

Fokko avatar Sep 21 '22 12:09 Fokko