spark-excel
spark-excel copied to clipboard
creds are not honored if we pass the creds in dataframereader's option
we can use other format without issue, but for spark excel, it seems we always getting error if we set creds in dataframereader.option like this: for gcp
df.option("google.cloud.auth.service.account.json.keyfile", "path to gcp.json") for azure blob: df.option("fs.azure.account.key.{account}.blob.core.windows.net", "shared key value")
we can read df.format('csv').load('path') but if we run df.format('com.crealytics.spark.excel').load('path') we get following error:
for azure:
`
py4j.protocol.Py4JJavaError: An error occurred while calling o73.load.
E : Configuration property kitchensink.dfs.core.windows.net not found.
E at org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:372)
E at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:1133)
E at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.
`
for gcp:
connection preview error description:"An error occurred while calling o281.load.\n: java.io.IOException: Error accessing gs://ascend-io-demo-data/kitchen_sink/excel/test.xlsx\n\tat com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:1910)\n\tat com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfo(GoogleCloudStorageImpl.java:1812)\n\tat com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.open(GoogleCloudStorageImpl.java:606)\n\tat com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.open(GoogleCloudStorageFileSystem.java:273)\n\tat com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream.<init>(GoogleHadoopFSInputStream.java:78)\n\tat com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.open(GoogleHadoopFileSystemBase.java:616)\n\tat org.apache.hadoop.fs.FileSystem.open(FileSystem.java:906)\n\tat com.crealytics.spark.excel.WorkbookReader$.readFromHadoop$1(WorkbookReader.scala:35)\n\tat com.crealytics.spark.excel.WorkbookReader$.$anonfun$apply$2(WorkbookReader.scala:41)\n\tat com.crealytics.spark.excel.DefaultWorkbookReader.$anonfun$openWorkbook$1(WorkbookReader.scala:49)\n\tat scala.Option.fold(Option.scala:251)\n\tat com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:49)\n\tat com.crealytics.spark.excel.WorkbookReader.withWorkbook(WorkbookReader.scala:14)\n\tat com.crealytics.spark.excel.WorkbookReader.withWorkbook$(WorkbookReader.scala:13)\n\tat com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:45)\n\tat com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:32)\n\tat com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:32)\n\tat com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:104)\n\tat com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:103)\n\tat com.crealytics.spark.excel.ExcelRelation.$anonfun$inferSchema$1(ExcelRelation.scala:172)\n\tat scala.Option.getOrElse(Option.scala:189)\n\tat com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:171)\n\tat com.crealytics.spark.excel.ExcelRelation.<init>(ExcelRelation.scala:36)\n\tat com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:36)\n\tat com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:13)\n\tat com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8)\n\tat org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355)\n\tat org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)\n\tat org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)\n\tat scala.Option.getOrElse(Option.scala:189)\n\tat org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)\n\tat org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:239)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden\nGET https://storage.googleapis.com/storage/v1/b/ascend-io-demo-data/o/kitchen_sink%2Fexcel%2Ftest.xlsx\n{\n \"code\" : 403,\n \"errors\" : [ {\n \"domain\" : \"global\",\n \"message\" : \"Insufficient Permission\",\n \"reason\" : \"insufficientPermissions\"\n } ],\n \"message\" : \"Insufficient Permission\"\n}\n\tat com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:150)\n\tat com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)\n\tat com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)\n\tat com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:444)\n\tat com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1108)\n\tat com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:542)\n\tat com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:475)\n\tat com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:592)\n\tat com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:1904)\n\t... 42 more\n" internal_service_unavailable:{}
We do not want to set the creds in a global spark context, so it would be great if you guys can help to update spark-excel to support reading creads from dataframewrite option instead of rely on global spark config.
Thank you @ruiyang2015 If you don't mind, please help list the steps that are needed to reproduce this issue? Maybe, even a wiki will be great. This one: https://github.com/crealytics/spark-excel/wiki/Examples:-With-Google-Cloud-Storage
There are a number of common use cases with cloud storage, e.g. GCS, Azure, S3, etc that spark-excel need to work well with. Sincerely,