iceberg Issue with CALL parsing

Query engine

Spark

Question

I am trying to use Iceberg glue catalog to integrate with spark. However I am able to query the table data but not able to run procedures.

Exception:- pyspark.sql.utils.ParseException: Syntax error at or near 'CALL'

Spark Config:-

Aug 17 '23 05:08 pulkit-cldcvr

I have the exact same issue for my unit tests under Spark 3.4 / Iceberg 1.3. Everything works well but those CALL calls or ALTER TABLE ... ADD|DROP PARTITION FIELD ... But the spark.sql.extensions is correctly set as described by @pulkit-cldcvr .

Aug 21 '23 14:08 baptistegh

@pulkit-cldcvr the issue happens only in pyspark or spark-shell as well?

Aug 23 '23 05:08 manuzhang

@manuzhang I am experiencing this in pyspark Jupyter notebook using Spark 3.4.1 on EMR Studio workspace.

Oct 03 '23 15:10 YuvalItzchakov

+1

Nov 15 '23 14:11 sundhar010

Quick Guess on what might be going wrong, My assumption would be the session being used is not actually loaded with the extensions. I've seen this happen in a few different instances,

(Most Common in General) The Spark Session was already created at the time that "getOrCreate" was called and the extensions are ignored.
(Most Common in Notebooks) The Spark Session is improperly cloned between threads used by the kernel. I've seen this most commonly with kernels using functional libraries (like cats) or something to managing execution. I'm not sure how this happens (but i've seen in sporadically) but I see that sometimes certain cells will be using SparkSession.getActiveSession to execute their SQL and when they do so they end up picking up a session which somehow was cloned without the config set. When directly queried the config will appear set, but when you access the "active session" during some executions it will vanish.

Nov 16 '23 16:11 RussellSpitzer

+1

Dec 18 '23 14:12 m-l-kaba

I am facing the same issue in pyspark - when creating external tables in hive using ICEBERG format.

[PARSE_SYNTAX_ERROR] Syntax error at or near 'ICEBERG'.(line 1, pos 42)

== SQL ==
CREATE EXTERNAL TABLE x (i int) STORED BY ICEBERG;
------------------------------------------^^^

Mar 05 '24 07:03 arvindeybram

in my case, adding spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions solve the problem

Jun 24 '24 09:06 qianzhen0

in my case, adding spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions solve the problem

hi bro, I face same problem too butＩ am running pyspark in colab, how could run this command?

Jul 21 '24 03:07 kennyluke1023

Facing same issues as below while try to use expire snapshots in glue version 4.0, added spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions as well, is there any workaround?

spark.sql("""CALL catalog_name.system.expire_snapshots('db_name.table_name')""")'

pyspark.sql.utils.ParseException: Syntax error at or near 'CALL'

Sep 12 '24 07:09 Ravi-una

"spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"

helps.

Sep 12 '24 11:09 Shekharrajak

"spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"

helps.

Yes, I have added all the below config, but still no luck, conf.set("spark.sql.catalog.job_catalog", "org.apache.iceberg.spark.SparkCatalog")

conf.set("spark.sql.catalog.job_catalog.warehouse", args['iceberg_job_catalog_warehouse']) conf.set("spark.sql.catalog.job_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog") conf.set("spark.sql.catalog.job_catalog.type", "glue") conf.set("spark.sql.catalog.job_catalog.io-impl", "org.apache.iceberg.aws.s3.S3FileIO") conf.set("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") conf.set("spark.hadoop.fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")

Sep 12 '24 11:09 Ravi-una

For EMR studio notebook, can try reading this guide from AWS, https://docs.aws.amazon.com/prescriptive-guidance/latest/apache-iceberg-on-aws/iceberg-emr.html, we tried and it works

Oct 09 '24 04:10 jeremytee97

I'm running this in a notebook jupyter in AWS Glue, and at the beginning of the script, before the Spark session is created, I add this:

%%configure { "--conf": "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions", "--datalake-formats": "iceberg" }

And it works!!

Jan 30 '25 11:01 fidelove

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

Jul 30 '25 00:07 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

Aug 14 '25 00:08 github-actions[bot]