qbeast-spark icon indicating copy to clipboard operation
qbeast-spark copied to clipboard

Unable to overwrite a delta table

Open Jiaweihu08 opened this issue 9 months ago • 3 comments

What went wrong?

Qbeast is not able to overwrite an existing delta table.

How to reproduce?

// Create a delta table
df.write.format("delta").save(tablePath)

// Overwrite it with qbeast
df.write.mode("overwrite").format("qbeast").option("columnsToIndex", "user_id").save(tablePath)

// The above would fail:
org.apache.spark.sql.AnalysisException: No space revision available with -1
  at org.apache.spark.sql.AnalysisExceptionFactory$.create(AnalysisExceptionFactory.scala:55)
  at io.qbeast.spark.delta.DeltaQbeastSnapshot.$anonfun$getRevision$1(DeltaQbeastSnapshot.scala:86)
  at scala.collection.immutable.Map$EmptyMap$.getOrElse(Map.scala:110)
  at io.qbeast.spark.delta.DeltaQbeastSnapshot.getRevision(DeltaQbeastSnapshot.scala:86)
  at io.qbeast.spark.delta.DeltaQbeastSnapshot.loadLatestRevision(DeltaQbeastSnapshot.scala:152)
  at io.qbeast.spark.internal.sources.QbeastDataSource.getTable(QbeastDataSource.scala:74)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:92)
  at org.apache.spark.sql.DataFrameWriter.getTable$1(DataFrameWriter.scala:281)
  at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:297)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:240)
  ... 47 elided

Branch and commit id: main, d37d238

Spark version: 3.5.0

Hadoop version: 3.3.4

How are you running Spark? local

Jiaweihu08 avatar May 03 '24 09:05 Jiaweihu08

Mmm... Is it possible to overwrite tables in other formats? Let's say overwrite a JSON table with Parquet. Or Delta with Iceberg?

osopardo1 avatar May 06 '24 07:05 osopardo1

Mmm... Is it possible to overwrite tables in other formats? Let's say overwrite a JSON table with Parquet. Or Delta with Iceberg?

Different file formats can overwrite each other without problem, and delta overwrites qbeast mercilessly.

The bug shown here is because it detects an existing table, but no qbeast metadata is found.

Jiaweihu08 avatar May 06 '24 09:05 Jiaweihu08

Thanks for the clarification 👍

osopardo1 avatar May 06 '24 10:05 osopardo1