spark-bigquery-connector icon indicating copy to clipboard operation
spark-bigquery-connector copied to clipboard

Failure to write to bigquery - Error getting access token from metadata server

Open Gaurang033 opened this issue 2 years ago • 1 comments

Hi Guys,

I am able to read from the bigquery. No issues. However, when I am trying to write back it's throwing following error.

I am creating spark session with jar

spark-shell --jars spark-bigquery-with-dependencies_2.11-0.23.1.jar

**code to read **

scala> import org.apache.spark.sql.SaveMode
import org.apache.spark.sql.SaveMode

scala> val df = spark.read.format("bigquery").option("parentproject", "xxx").load("test.names")
df: org.apache.spark.sql.DataFrame = [name: string]

scala> df.show()
+----+
|name|
+----+
|  aa|
| bbb|
+----+

code to write

scala>  df.write.format("bigquery").option("parentproject", "xxx").option("temporaryGcsBucket", "yyy").option("table", s"test.names").mode(SaveMode.Append).save()

error

java.io.IOException: Error getting access token from metadata server at: http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token
  at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:236)
  at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialConfiguration.getCredential(CredentialConfiguration.java:91)
  at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getCredential(GoogleHadoopFileSystemBase.java:1533)
  at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1554)
  at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:654)
  at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:617)
  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3288)
  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3337)
  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3305)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
  at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
  at com.google.cloud.spark.bigquery.BigQueryWriteHelper.<init>(BigQueryWriteHelper.scala:64)
  at com.google.cloud.spark.bigquery.BigQueryInsertableRelation.insert(BigQueryInsertableRelation.scala:42)
  at com.google.cloud.spark.bigquery.BigQueryRelationProvider.createRelation(BigQueryRelationProvider.scala:111)
  at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
  at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
  at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
  at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
  at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
  at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
  at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
  ... 50 elided
Caused by: java.net.NoRouteToHostException: No route to host (Host unreachable)
  at java.net.PlainSocketImpl.socketConnect(Native Method)
  at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
  at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
  at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
  at java.net.Socket.connect(Socket.java:589)
  at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
  at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
  at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
  at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
  at sun.net.www.http.HttpClient.New(HttpClient.java:308)
  at sun.net.www.http.HttpClient.New(HttpClient.java:326)
  at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
  at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
  at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
  at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
  at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:104)
  at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
  at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory$ComputeCredentialWithRetry.executeRefreshToken(CredentialFactory.java:184)
  at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.auth.oauth2.Credential.refreshToken(Credential.java:493)
  at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:233)
  ... 84 more

I tried setting following options as well but it's not working

spark.conf.set("fs.gs.auth.service.account.enable", "true")
spark.conf.set("google.cloud.auth.service.account.enable", true)
spark.conf.set("google.cloud.auth.service.account.json.keyfile", "/home/edhst2m/sdp_aw.json")
spark.conf.set("fs.gs.project.id", "xxx")
spark.conf.set("fs.AbstractFileSystem.gs.impl","com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")

Gaurang033 avatar Sep 08 '22 17:09 Gaurang033

Hello, facing the same issue writing data from s3 to bq. Have you found a fix or workaround for this issue?

chriskathumbi avatar Oct 10 '22 15:10 chriskathumbi

Please review the How do I authenticate outside GCE / Dataproc? in the README

davidrabinowitz avatar Apr 11 '23 21:04 davidrabinowitz