Issue with CALL parsing
Query engine
Spark
Question
I am trying to use Iceberg glue catalog to integrate with spark. However I am able to query the table data but not able to run procedures.
Exception:- pyspark.sql.utils.ParseException: Syntax error at or near 'CALL'
Spark Config:-
I have the exact same issue for my unit tests under Spark 3.4 / Iceberg 1.3. Everything works well but those CALL calls or ALTER TABLE ... ADD|DROP PARTITION FIELD ...
But the spark.sql.extensions is correctly set as described by @pulkit-cldcvr .
@pulkit-cldcvr the issue happens only in pyspark or spark-shell as well?
@manuzhang I am experiencing this in pyspark Jupyter notebook using Spark 3.4.1 on EMR Studio workspace.
+1
Quick Guess on what might be going wrong, My assumption would be the session being used is not actually loaded with the extensions. I've seen this happen in a few different instances,
- (Most Common in General) The Spark Session was already created at the time that "getOrCreate" was called and the extensions are ignored.
- (Most Common in Notebooks) The Spark Session is improperly cloned between threads used by the kernel. I've seen this most commonly with kernels using functional libraries (like cats) or something to managing execution. I'm not sure how this happens (but i've seen in sporadically) but I see that sometimes certain cells will be using SparkSession.getActiveSession to execute their SQL and when they do so they end up picking up a session which somehow was cloned without the config set. When directly queried the config will appear set, but when you access the "active session" during some executions it will vanish.
+1
I am facing the same issue in pyspark - when creating external tables in hive using ICEBERG format.
[PARSE_SYNTAX_ERROR] Syntax error at or near 'ICEBERG'.(line 1, pos 42)
== SQL ==
CREATE EXTERNAL TABLE x (i int) STORED BY ICEBERG;
------------------------------------------^^^
in my case, adding spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions solve the problem
in my case, adding
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensionssolve the problem
hi bro, I face same problem too butI am running pyspark in colab, how could run this command?
Facing same issues as below while try to use expire snapshots in glue version 4.0, added spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions as well, is there any workaround?
spark.sql("""CALL catalog_name.system.expire_snapshots('db_name.table_name')""")'
pyspark.sql.utils.ParseException: Syntax error at or near 'CALL'
"spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"
helps.
"spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"helps.
Yes, I have added all the below config, but still no luck, conf.set("spark.sql.catalog.job_catalog", "org.apache.iceberg.spark.SparkCatalog")
conf.set("spark.sql.catalog.job_catalog.warehouse", args['iceberg_job_catalog_warehouse']) conf.set("spark.sql.catalog.job_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog") conf.set("spark.sql.catalog.job_catalog.type", "glue") conf.set("spark.sql.catalog.job_catalog.io-impl", "org.apache.iceberg.aws.s3.S3FileIO") conf.set("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") conf.set("spark.hadoop.fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
For EMR studio notebook, can try reading this guide from AWS, https://docs.aws.amazon.com/prescriptive-guidance/latest/apache-iceberg-on-aws/iceberg-emr.html, we tried and it works
I'm running this in a notebook jupyter in AWS Glue, and at the beginning of the script, before the Spark session is created, I add this:
%%configure { "--conf": "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions", "--datalake-formats": "iceberg" }
And it works!!
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'