qbeast-spark icon indicating copy to clipboard operation
qbeast-spark copied to clipboard

Unexpected exception when reading non-qbeast-formatted data

Open eavilaes opened this issue 3 years ago • 1 comments

What went wrong? The following exception should be thrown when you load data that is not in qbeast format or when the path does not exist. It works well when the path does not exist; however, a different exception is thrown when the path exists and it is non-qbeast-formatted data: https://github.com/Qbeast-io/qbeast-spark/blob/d9bd04aa17f60b2ca3e2f7143193e931ab40d389/src/main/scala/io/qbeast/spark/internal/sources/QbeastDataSource.scala#L89-L94

My conclusion is that the format of the table is not checked, as this happens as well when trying to load a table indexed with an old version of the qbeast-spark format.

How to reproduce?

  1. Code that triggered the bug, or steps to reproduce:
  • Load an empty path:
val df = spark.read.format("qbeast").load("nonExistingPath")
org.apache.spark.sql.AnalysisException: 'nonExistingPath' is not a Qbeast formatted data directory.
  • But try to load a delta-formatted table or a qbeast table written with the old version of the format, and the exception will refer to the revision:
val df = spark.read.format("qbeast").load("deltaTablePath")
org.apache.spark.sql.AnalysisException: No space revision available with -1
  1. Branch and commit id: main, d9bd04a

  2. Spark version: 3.1.1

  3. Hadoop version: 3.2.0

  4. Are you running Spark inside a container? Are you launching the app on a remote K8s cluster? Or are you just running the tests in a local computer? N/A

  5. Stack trace:

val df = spark.read.format("qbeast").load("deltaTablePath")
org.apache.spark.sql.AnalysisException: No space revision available with -1
  at org.apache.spark.sql.AnalysisExceptionFactory$.create(AnalysisExceptionFactory.scala:36)
  at io.qbeast.spark.delta.DeltaQbeastSnapshot.$anonfun$getRevision$1(DeltaQbeastSnapshot.scala:81)
  at scala.collection.immutable.Map$EmptyMap$.getOrElse(Map.scala:104)
  at io.qbeast.spark.delta.DeltaQbeastSnapshot.getRevision(DeltaQbeastSnapshot.scala:81)
  at io.qbeast.spark.delta.DeltaQbeastSnapshot.loadLatestRevision(DeltaQbeastSnapshot.scala:140)
  at io.qbeast.spark.internal.sources.QbeastBaseRelation$.forDeltaTable(QbeastBaseRelation.scala:43)
  at io.qbeast.spark.table.IndexedTableImpl.createQbeastBaseRelation(IndexedTable.scala:194)
  at io.qbeast.spark.table.IndexedTableImpl.load(IndexedTable.scala:171)
  at io.qbeast.spark.internal.sources.QbeastDataSource.createRelation(QbeastDataSource.scala:90)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326)
  at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:306)
  at scala.Option.map(Option.scala:230)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:266)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240)
  ... 47 elided

eavilaes avatar Dec 09 '21 15:12 eavilaes

We can fix this with #44

osopardo1 avatar Dec 09 '21 15:12 osopardo1

I will close this issue because is more related to #121 and #102

osopardo1 avatar Jan 20 '23 15:01 osopardo1