qbeast-spark
qbeast-spark copied to clipboard
Unexpected exception when reading non-qbeast-formatted data
What went wrong? The following exception should be thrown when you load data that is not in qbeast format or when the path does not exist. It works well when the path does not exist; however, a different exception is thrown when the path exists and it is non-qbeast-formatted data: https://github.com/Qbeast-io/qbeast-spark/blob/d9bd04aa17f60b2ca3e2f7143193e931ab40d389/src/main/scala/io/qbeast/spark/internal/sources/QbeastDataSource.scala#L89-L94
My conclusion is that the format of the table is not checked, as this happens as well when trying to load a table indexed with an old version of the qbeast-spark format.
How to reproduce?
- Code that triggered the bug, or steps to reproduce:
- Load an empty path:
val df = spark.read.format("qbeast").load("nonExistingPath")
org.apache.spark.sql.AnalysisException: 'nonExistingPath' is not a Qbeast formatted data directory.
- But try to load a
delta
-formatted table or aqbeast
table written with the old version of the format, and the exception will refer to the revision:
val df = spark.read.format("qbeast").load("deltaTablePath")
org.apache.spark.sql.AnalysisException: No space revision available with -1
-
Branch and commit id: main, d9bd04a
-
Spark version: 3.1.1
-
Hadoop version: 3.2.0
-
Are you running Spark inside a container? Are you launching the app on a remote K8s cluster? Or are you just running the tests in a local computer? N/A
-
Stack trace:
val df = spark.read.format("qbeast").load("deltaTablePath")
org.apache.spark.sql.AnalysisException: No space revision available with -1
at org.apache.spark.sql.AnalysisExceptionFactory$.create(AnalysisExceptionFactory.scala:36)
at io.qbeast.spark.delta.DeltaQbeastSnapshot.$anonfun$getRevision$1(DeltaQbeastSnapshot.scala:81)
at scala.collection.immutable.Map$EmptyMap$.getOrElse(Map.scala:104)
at io.qbeast.spark.delta.DeltaQbeastSnapshot.getRevision(DeltaQbeastSnapshot.scala:81)
at io.qbeast.spark.delta.DeltaQbeastSnapshot.loadLatestRevision(DeltaQbeastSnapshot.scala:140)
at io.qbeast.spark.internal.sources.QbeastBaseRelation$.forDeltaTable(QbeastBaseRelation.scala:43)
at io.qbeast.spark.table.IndexedTableImpl.createQbeastBaseRelation(IndexedTable.scala:194)
at io.qbeast.spark.table.IndexedTableImpl.load(IndexedTable.scala:171)
at io.qbeast.spark.internal.sources.QbeastDataSource.createRelation(QbeastDataSource.scala:90)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:306)
at scala.Option.map(Option.scala:230)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:266)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240)
... 47 elided
We can fix this with #44
I will close this issue because is more related to #121 and #102