iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Issue with CALL parsing

Open pulkit-cldcvr opened this issue 2 years ago • 8 comments

Query engine

Spark

Question

I am trying to use Iceberg glue catalog to integrate with spark. However I am able to query the table data but not able to run procedures.

Exception:- pyspark.sql.utils.ParseException: Syntax error at or near 'CALL'

image

Spark Config:- image

pulkit-cldcvr avatar Aug 17 '23 05:08 pulkit-cldcvr

I have the exact same issue for my unit tests under Spark 3.4 / Iceberg 1.3. Everything works well but those CALL calls or ALTER TABLE ... ADD|DROP PARTITION FIELD ... But the spark.sql.extensions is correctly set as described by @pulkit-cldcvr .

baptistegh avatar Aug 21 '23 14:08 baptistegh

@pulkit-cldcvr the issue happens only in pyspark or spark-shell as well?

manuzhang avatar Aug 23 '23 05:08 manuzhang

@manuzhang I am experiencing this in pyspark Jupyter notebook using Spark 3.4.1 on EMR Studio workspace.

YuvalItzchakov avatar Oct 03 '23 15:10 YuvalItzchakov

+1

sundhar010 avatar Nov 15 '23 14:11 sundhar010

Quick Guess on what might be going wrong, My assumption would be the session being used is not actually loaded with the extensions. I've seen this happen in a few different instances,

  1. (Most Common in General) The Spark Session was already created at the time that "getOrCreate" was called and the extensions are ignored.
  2. (Most Common in Notebooks) The Spark Session is improperly cloned between threads used by the kernel. I've seen this most commonly with kernels using functional libraries (like cats) or something to managing execution. I'm not sure how this happens (but i've seen in sporadically) but I see that sometimes certain cells will be using SparkSession.getActiveSession to execute their SQL and when they do so they end up picking up a session which somehow was cloned without the config set. When directly queried the config will appear set, but when you access the "active session" during some executions it will vanish.

RussellSpitzer avatar Nov 16 '23 16:11 RussellSpitzer

+1

m-l-kaba avatar Dec 18 '23 14:12 m-l-kaba

I am facing the same issue in pyspark - when creating external tables in hive using ICEBERG format.

[PARSE_SYNTAX_ERROR] Syntax error at or near 'ICEBERG'.(line 1, pos 42)

== SQL ==
CREATE EXTERNAL TABLE x (i int) STORED BY ICEBERG;
------------------------------------------^^^

arvindeybram avatar Mar 05 '24 07:03 arvindeybram

in my case, adding spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions solve the problem

qianzhen0 avatar Jun 24 '24 09:06 qianzhen0

in my case, adding spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions solve the problem

hi bro, I face same problem too butI am running pyspark in colab, how could run this command?

kennyluke1023 avatar Jul 21 '24 03:07 kennyluke1023

Facing same issues as below while try to use expire snapshots in glue version 4.0, added spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions as well, is there any workaround?

spark.sql("""CALL catalog_name.system.expire_snapshots('db_name.table_name')""")'

pyspark.sql.utils.ParseException: Syntax error at or near 'CALL'

Ravi-una avatar Sep 12 '24 07:09 Ravi-una

"spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"

helps.

Shekharrajak avatar Sep 12 '24 11:09 Shekharrajak

"spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"

helps.

Yes, I have added all the below config, but still no luck, conf.set("spark.sql.catalog.job_catalog", "org.apache.iceberg.spark.SparkCatalog")

conf.set("spark.sql.catalog.job_catalog.warehouse", args['iceberg_job_catalog_warehouse']) conf.set("spark.sql.catalog.job_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog") conf.set("spark.sql.catalog.job_catalog.type", "glue") conf.set("spark.sql.catalog.job_catalog.io-impl", "org.apache.iceberg.aws.s3.S3FileIO") conf.set("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") conf.set("spark.hadoop.fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")

Ravi-una avatar Sep 12 '24 11:09 Ravi-una

For EMR studio notebook, can try reading this guide from AWS, https://docs.aws.amazon.com/prescriptive-guidance/latest/apache-iceberg-on-aws/iceberg-emr.html, we tried and it works

jeremytee97 avatar Oct 09 '24 04:10 jeremytee97

I'm running this in a notebook jupyter in AWS Glue, and at the beginning of the script, before the Spark session is created, I add this:

%%configure { "--conf": "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions", "--datalake-formats": "iceberg" }

And it works!!

fidelove avatar Jan 30 '25 11:01 fidelove

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Jul 30 '25 00:07 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Aug 14 '25 00:08 github-actions[bot]