iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Iceberg bucket partitioning issue

Open jessiedanwang opened this issue 1 month ago • 0 comments

Apache Iceberg version

1.5.0

Query engine

Spark

Please describe the bug 🐞

Hi, there, I have tried to register bucket udf, see below, val bucketF = "bucketn" IcebergSpark.registerBucketUDF(spark, bucketF, DataTypes.LongType, numBuckets) if(spark.sql(s"show USER functions $bucketF").count == 0) { log.error("unable to register bucket function") //it is not in the log, so looks like bucketn udf is registered }

However, the following line of code throw errors,
df.sortWithinPartitions(expr(s"bucketn($bucketCol)")),

24/06/24 14:22:00 ERROR GlueMetastoreClientDelegate: com.amazonaws.services.glue.model.EntityNotFoundException: Cannot find function. (Service: AWSGlue; Status Code: 400; Error Code: EntityNotFoundException; Request ID: 6eace0be-05e8-474c-aadf-595255c8a64e; Proxy: null) 24/06/24 14:22:00 ERROR MicroBatchExecution: Query iceberg/cdc/xxx_qa/xxx [id = b9e11736-cc3c-4260-aa2b-17dd6735f8bf, runId = e66a33c7-ee44-4ed1-8a14-6a08a6412e77] terminated with error org.apache.spark.sql.AnalysisException: [UNRESOLVED_ROUTINE] Cannot resolve function bucketn on search path [system.builtin, system.session, spark_catalog.default].; line 1 pos 0

Looks like the error has to do with AWS glue udf? Anything else needs to be done besides calling registerBucketUDF using AWS glue catalog? Would you please advise what needs to be done here? Thanks a lot.

jessiedanwang avatar Jun 24 '24 14:06 jessiedanwang