qbeast-spark
qbeast-spark copied to clipboard
Unable to overwrite a delta table
What went wrong?
Qbeast is not able to overwrite an existing delta table.
How to reproduce?
// Create a delta table
df.write.format("delta").save(tablePath)
// Overwrite it with qbeast
df.write.mode("overwrite").format("qbeast").option("columnsToIndex", "user_id").save(tablePath)
// The above would fail:
org.apache.spark.sql.AnalysisException: No space revision available with -1
at org.apache.spark.sql.AnalysisExceptionFactory$.create(AnalysisExceptionFactory.scala:55)
at io.qbeast.spark.delta.DeltaQbeastSnapshot.$anonfun$getRevision$1(DeltaQbeastSnapshot.scala:86)
at scala.collection.immutable.Map$EmptyMap$.getOrElse(Map.scala:110)
at io.qbeast.spark.delta.DeltaQbeastSnapshot.getRevision(DeltaQbeastSnapshot.scala:86)
at io.qbeast.spark.delta.DeltaQbeastSnapshot.loadLatestRevision(DeltaQbeastSnapshot.scala:152)
at io.qbeast.spark.internal.sources.QbeastDataSource.getTable(QbeastDataSource.scala:74)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:92)
at org.apache.spark.sql.DataFrameWriter.getTable$1(DataFrameWriter.scala:281)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:297)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:240)
... 47 elided
Branch and commit id: main
, d37d238
Spark version: 3.5.0
Hadoop version: 3.3.4
How are you running Spark? local
Mmm... Is it possible to overwrite tables in other formats? Let's say overwrite a JSON table with Parquet. Or Delta with Iceberg?
Mmm... Is it possible to overwrite tables in other formats? Let's say overwrite a JSON table with Parquet. Or Delta with Iceberg?
Different file formats can overwrite each other without problem, and delta overwrites qbeast mercilessly.
The bug shown here is because it detects an existing table, but no qbeast metadata is found.
Thanks for the clarification 👍