delta icon indicating copy to clipboard operation
delta copied to clipboard

[BUG][Spark][Hive] Create a Delta Lake Table in Hive Metastore with saveAsTable() Method

Open yamensaban opened this issue 2 years ago • 4 comments

[BUG][Spark][Hive] Create a Delta Lake Table in Hive Metastore with saveAsTable() Method

Which Delta project/connector is this regarding?

  • [X] Spark
  • [ ] Standalone
  • [ ] Flink
  • [ ] Kernel
  • [ ] Other (fill in here)

Describe the problem

Steps to reproduce

  1. Open PySpark session pyspark --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"

  2. Sample code

from delta import * 

data = spark.range(0, 5) 
data.write.format("delta").saveAsTable("deltatable")

Observed results

The table gets created with the wrong Input/Output Format classes Sequence, so when I try to query the table from Hive I get an error. When I use save() method as below, and then create an External table in Hive it works fine. data.write.format("delta").save("/tmp/deltatest/deltatable")

Environment information

  • Delta Lake version: 2.3
  • Spark version: 3.3.1
  • Scala version: 2.12

yamensaban avatar Aug 21 '23 06:08 yamensaban

cc. @tdas

vkorukanti avatar Aug 21 '23 20:08 vkorukanti

@yamensaban You can't use the entry created by Spark in HMS to query the table from Hive. Hive needs another entry in HMS and a separate jar in order to read the Delta table. Here are the instructions.

vkorukanti avatar Aug 22 '23 18:08 vkorukanti

@vkorukanti Yes I know, I have already added the following properties to hive-site.xml under spark and hiveserver2 image

I have also added delta-hive-assembly_2.12-0.6.0.jar file under hive/lib and spark/jars path, External tables created after the data is written by Spark using save() method works fine but when it comes to saveAsTable() method the table gets created as below, with wrong input format and cannot be quired image

When I try to query the table from Spark SQL it works fine image

yamensaban avatar Aug 24 '23 07:08 yamensaban

@yamensaban It's a known limitation of the connector. You have to create a separate external table definition from Hive that points to the same location to read it.

orthoxerox avatar Dec 23 '23 17:12 orthoxerox