iceberg
iceberg copied to clipboard
Iceberg bucket partitioning issue
Apache Iceberg version
1.5.0
Query engine
Spark
Please describe the bug 🐞
Hi, there, I have tried to register bucket udf, see below, val bucketF = "bucketn" IcebergSpark.registerBucketUDF(spark, bucketF, DataTypes.LongType, numBuckets) if(spark.sql(s"show USER functions $bucketF").count == 0) { log.error("unable to register bucket function") //it is not in the log, so looks like bucketn udf is registered }
However, the following line of code throw errors, df.sortWithinPartitions(expr(s"bucketn($bucketCol)")),
24/06/24 14:22:00 ERROR GlueMetastoreClientDelegate: com.amazonaws.services.glue.model.EntityNotFoundException: Cannot find function. (Service: AWSGlue; Status Code: 400; Error Code: EntityNotFoundException; Request ID: 6eace0be-05e8-474c-aadf-595255c8a64e; Proxy: null)
24/06/24 14:22:00 ERROR MicroBatchExecution: Query iceberg/cdc/xxx_qa/xxx [id = b9e11736-cc3c-4260-aa2b-17dd6735f8bf, runId = e66a33c7-ee44-4ed1-8a14-6a08a6412e77] terminated with error
org.apache.spark.sql.AnalysisException: [UNRESOLVED_ROUTINE] Cannot resolve function bucketn
on search path [system
.builtin
, system
.session
, spark_catalog
.default
].; line 1 pos 0
Looks like the error has to do with AWS glue udf? Anything else needs to be done besides calling registerBucketUDF using AWS glue catalog? Would you please advise what needs to be done here? Thanks a lot.