iceberg
iceberg copied to clipboard
Error happened after deleting a partitioned column
error message:
Caused by: java.lang.NullPointerException: Cannot find source column: 3
at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:953) ~[iceberg-bundled-guava-0.13.2.jar:na]
at org.apache.iceberg.PartitionSpec$Builder.add(PartitionSpec.java:503) ~[iceberg-api-0.13.2.jar:na]
at org.apache.iceberg.PartitionSpecParser.buildFromJsonFields(PartitionSpecParser.java:155) ~[iceberg-core-0.13.2.jar:na]
at org.apache.iceberg.PartitionSpecParser.fromJson(PartitionSpecParser.java:78) ~[iceberg-core-0.13.2.jar:na]
at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:357) ~[iceberg-core-0.13.2.jar:na]
at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:288) ~[iceberg-core-0.13.2.jar:na]
json of metadata file contains information of schemas/partition-specs/sort-orders. But there is no link between schemas and partition-specs, thus deleting a partitioned column will raise error while building history partition-specs, because source-id could not be found in current schema. I think that schema-id should be add to json of partition-specs. part of metadata file:
"last-column-id":3,
"current-schema-id":1,
"schemas":[
{
"type":"struct",
"schema-id":0,
"fields":[
{
"id":1,
"name":"name1",
"required":false,
"type":"string"
},
{
"id":2,
"name":"name2",
"required":false,
"type":"string"
},
{
"id":3,
"name":"name3",
"required":false,
"type":"string"
}
]
},
{
"type":"struct",
"schema-id":1,
"fields":[
{
"id":1,
"name":"name1",
"required":false,
"type":"string"
},
{
"id":2,
"name":"name2",
"required":false,
"type":"string"
}
]
}
],
"default-spec-id":1,
"partition-specs":[
{
"spec-id":0,
"fields":[
{
"name":"name3",
"transform":"identity",
"source-id":3,
"field-id":1000
}
]
},
{
"spec-id":1,
"fields":[
]
}
],
"last-partition-id":1000
@lvyanquan:
Do you have a testcase or sample SQL to reproduce this?
As I want to know when do we build the history partition-specs
@lvyanquan:
Do you have a testcase or sample SQL to reproduce this? As I want to know when do we build the
history partition-specs
we can reproduce this error using the following sql (spark3.2, iceberg0.13 or 0.14), prod is the name of catalog:
CREATE TABLE prod.db.sample (id bigint, data string, category string) USING iceberg PARTITIONED BY (category) TBLPROPERTIES('format-version' = '2');
ALTER TABLE prod.db.sample DROP PARTITION FIELD category;
ALTER TABLE prod.db.sample DROP COLUMN category; Even though I deleted this column using JAVA API, I met NullPointerException when using this table.
Update: Just a different exception in the latest code. But the problem still exist
Cannot find source column for partition field: 1000: category: identity(3)
org.apache.iceberg.exceptions.ValidationException: Cannot find source column for partition field: 1000: category: identity(3)
at org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:49)
at org.apache.iceberg.PartitionSpec.checkCompatibility(PartitionSpec.java:558)
at org.apache.iceberg.PartitionSpec$Builder.build(PartitionSpec.java:546)
at org.apache.iceberg.UnboundPartitionSpec.bind(UnboundPartitionSpec.java:45)
at org.apache.iceberg.PartitionSpecParser.fromJson(PartitionSpecParser.java:85)
at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:390)
at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:311)
at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:274)
at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:267)
at org.apache.iceberg.hadoop.HadoopTableOperations.updateVersionAndMetadata(HadoopTableOperations.java:98)
at org.apache.iceberg.hadoop.HadoopTableOperations.refresh(HadoopTableOperations.java:121)
at org.apache.iceberg.hadoop.HadoopTableOperations.current(HadoopTableOperations.java:84)
at org.apache.iceberg.BaseTable.properties(BaseTable.java:119)
at org.apache.iceberg.spark.source.SparkTable.<init>(SparkTable.java:128)
at org.apache.iceberg.spark.source.SparkTable.<init>(SparkTable.java:118)
at org.apache.iceberg.spark.SparkCatalog.alterTable(SparkCatalog.java:290)
just fyi that we're tracking the same issue in https://github.com/apache/iceberg/issues/5676
Hey all, I have a PR ready: https://github.com/apache/iceberg/pull/5707 This doesn't lookup the historical columns anymore.