spark-excel icon indicating copy to clipboard operation
spark-excel copied to clipboard

creds are not honored if we pass the creds in dataframereader's option

Open ruiyang2015 opened this issue 3 years ago • 1 comments

we can use other format without issue, but for spark excel, it seems we always getting error if we set creds in dataframereader.option like this: for gcp

df.option("google.cloud.auth.service.account.json.keyfile", "path to gcp.json") for azure blob: df.option("fs.azure.account.key.{account}.blob.core.windows.net", "shared key value")

we can read df.format('csv').load('path') but if we run df.format('com.crealytics.spark.excel').load('path') we get following error:

for azure:

` py4j.protocol.Py4JJavaError: An error occurred while calling o73.load. E : Configuration property kitchensink.dfs.core.windows.net not found. E at org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:372) E at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:1133) E at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.(AzureBlobFileSystemStore.java:174) E at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:110) E at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3375) E at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:125) E at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3424) E at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3392) E at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:485) E at com.crealytics.spark.excel.WorkbookReader$.readFromHadoop$1(WorkbookReader.scala:35) E at com.crealytics.spark.excel.WorkbookReader$.$anonfun$apply$2(WorkbookReader.scala:41) E at com.crealytics.spark.excel.DefaultWorkbookReader.$anonfun$openWorkbook$1(WorkbookReader.scala:49) E at scala.Option.fold(Option.scala:251) E at com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:49) E at com.crealytics.spark.excel.WorkbookReader.withWorkbook(WorkbookReader.scala:14) E at com.crealytics.spark.excel.WorkbookReader.withWorkbook$(WorkbookReader.scala:13) E at com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:45) E at com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:32) E at com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:32) E at com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:104) E at com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:103) E at com.crealytics.spark.excel.ExcelRelation.$anonfun$inferSchema$1(ExcelRelation.scala:172) E at scala.Option.getOrElse(Option.scala:189) E at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:171) E at com.crealytics.spark.excel.ExcelRelation.(ExcelRelation.scala:36) E at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:36) E at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:13) E at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8) E at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355) E at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325) E at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307) E at scala.Option.getOrElse(Option.scala:189) E at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307) E at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:239) E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) E at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) E at java.base/java.lang.reflect.Method.invoke(Method.java:566) E at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) E at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) E at py4j.Gateway.invoke(Gateway.java:282) E at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) E at py4j.commands.CallCommand.execute(CallCommand.java:79) E at py4j.GatewayConnection.run(GatewayConnection.java:238) E at java.base/java.lang.Thread.run(Thread.java:829)

`

for gcp:

connection preview error description:"An error occurred while calling o281.load.\n: java.io.IOException: Error accessing gs://ascend-io-demo-data/kitchen_sink/excel/test.xlsx\n\tat com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:1910)\n\tat com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfo(GoogleCloudStorageImpl.java:1812)\n\tat com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.open(GoogleCloudStorageImpl.java:606)\n\tat com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.open(GoogleCloudStorageFileSystem.java:273)\n\tat com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream.<init>(GoogleHadoopFSInputStream.java:78)\n\tat com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.open(GoogleHadoopFileSystemBase.java:616)\n\tat org.apache.hadoop.fs.FileSystem.open(FileSystem.java:906)\n\tat com.crealytics.spark.excel.WorkbookReader$.readFromHadoop$1(WorkbookReader.scala:35)\n\tat com.crealytics.spark.excel.WorkbookReader$.$anonfun$apply$2(WorkbookReader.scala:41)\n\tat com.crealytics.spark.excel.DefaultWorkbookReader.$anonfun$openWorkbook$1(WorkbookReader.scala:49)\n\tat scala.Option.fold(Option.scala:251)\n\tat com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:49)\n\tat com.crealytics.spark.excel.WorkbookReader.withWorkbook(WorkbookReader.scala:14)\n\tat com.crealytics.spark.excel.WorkbookReader.withWorkbook$(WorkbookReader.scala:13)\n\tat com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:45)\n\tat com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:32)\n\tat com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:32)\n\tat com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:104)\n\tat com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:103)\n\tat com.crealytics.spark.excel.ExcelRelation.$anonfun$inferSchema$1(ExcelRelation.scala:172)\n\tat scala.Option.getOrElse(Option.scala:189)\n\tat com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:171)\n\tat com.crealytics.spark.excel.ExcelRelation.<init>(ExcelRelation.scala:36)\n\tat com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:36)\n\tat com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:13)\n\tat com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8)\n\tat org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355)\n\tat org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)\n\tat org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)\n\tat scala.Option.getOrElse(Option.scala:189)\n\tat org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)\n\tat org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:239)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden\nGET https://storage.googleapis.com/storage/v1/b/ascend-io-demo-data/o/kitchen_sink%2Fexcel%2Ftest.xlsx\n{\n \"code\" : 403,\n \"errors\" : [ {\n \"domain\" : \"global\",\n \"message\" : \"Insufficient Permission\",\n \"reason\" : \"insufficientPermissions\"\n } ],\n \"message\" : \"Insufficient Permission\"\n}\n\tat com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:150)\n\tat com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)\n\tat com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)\n\tat com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:444)\n\tat com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1108)\n\tat com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:542)\n\tat com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:475)\n\tat com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:592)\n\tat com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:1904)\n\t... 42 more\n" internal_service_unavailable:{}

We do not want to set the creds in a global spark context, so it would be great if you guys can help to update spark-excel to support reading creads from dataframewrite option instead of rely on global spark config.

ruiyang2015 avatar Sep 17 '21 00:09 ruiyang2015

Thank you @ruiyang2015 If you don't mind, please help list the steps that are needed to reproduce this issue? Maybe, even a wiki will be great. This one: https://github.com/crealytics/spark-excel/wiki/Examples:-With-Google-Cloud-Storage

There are a number of common use cases with cloud storage, e.g. GCS, Azure, S3, etc that spark-excel need to work well with. Sincerely,

quanghgx avatar Sep 17 '21 01:09 quanghgx