delta
delta copied to clipboard
[BUG][Spark][Hive] Create a Delta Lake Table in Hive Metastore with saveAsTable() Method
[BUG][Spark][Hive] Create a Delta Lake Table in Hive Metastore with saveAsTable() Method
Which Delta project/connector is this regarding?
- [X] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)
Describe the problem
Steps to reproduce
-
Open PySpark session
pyspark --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" -
Sample code
from delta import *
data = spark.range(0, 5)
data.write.format("delta").saveAsTable("deltatable")
Observed results
The table gets created with the wrong Input/Output Format classes Sequence, so when I try to query the table from Hive I get an error.
When I use save() method as below, and then create an External table in Hive it works fine.
data.write.format("delta").save("/tmp/deltatest/deltatable")
Environment information
- Delta Lake version: 2.3
- Spark version: 3.3.1
- Scala version: 2.12
cc. @tdas
@yamensaban You can't use the entry created by Spark in HMS to query the table from Hive. Hive needs another entry in HMS and a separate jar in order to read the Delta table. Here are the instructions.
@vkorukanti Yes I know, I have already added the following properties to hive-site.xml under spark and hiveserver2
I have also added delta-hive-assembly_2.12-0.6.0.jar file under hive/lib and spark/jars path, External tables created after the data is written by Spark using save() method works fine but when it comes to saveAsTable() method the table gets created as below, with wrong input format and cannot be quired
When I try to query the table from Spark SQL it works fine
@yamensaban It's a known limitation of the connector. You have to create a separate external table definition from Hive that points to the same location to read it.