spark-bigquery-connector icon indicating copy to clipboard operation
spark-bigquery-connector copied to clipboard

Provide better error message in ReadSessionCreator.create() if table doesn't exist

Open jphalip opened this issue 3 years ago • 3 comments

You you call the ReadSessionCreator.create() method and the table doesn't exist, you get a NullPointerException. This is because the ReadSessionCreator class in the bigquery-connector-common library does not set the setThrowNotFound option when it calls the BQ API's getTable() method. We should maybe modify that code to better handle this case and provide a better error message instead of just throwing a NullPointerException.

jphalip avatar May 25 '22 18:05 jphalip

That's because we check it earlier, this is the error in spark:

val df = spark.read.format("bigquery").load("davidrab.doesnotexist")
java.lang.RuntimeException: Table project.dataset.doesnotexist not found
  at scala.sys.package$.error(package.scala:30)
  at com.google.cloud.spark.bigquery.BigQueryRelationProvider.$anonfun$createRelationInternal$1(BigQueryRelationProvider.scala:80)
  at scala.Option.getOrElse(Option.scala:189)
  at com.google.cloud.spark.bigquery.BigQueryRelationProvider.createRelationInternal(BigQueryRelationProvider.scala:80)
  at com.google.cloud.spark.bigquery.BigQueryRelationProvider.createRelation(BigQueryRelationProvider.scala:46)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
  at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:239)
  ... 47 elided

davidrabinowitz avatar May 25 '22 19:05 davidrabinowitz

Would it make sense to still set the setThrowNotFound option? That shouldn't break the Spark code (since it already checks the table's existence prior), and would allow the Hive connector to get a better error message without having to make a separate API call to check the table's existence.

jphalip avatar May 25 '22 19:05 jphalip

I ended up making an extra call to BigQuery to check the table's existence upfront: https://github.com/GoogleCloudDataproc/hive-bigquery-connector/commit/dafa09f2177b4134c124a81908a5171c12987580

Would still be nice to update ReadSessionCreator.create() to directly handle that edge case so we can save making that extra call.

jphalip avatar Oct 04 '22 21:10 jphalip