astro-sdk icon indicating copy to clipboard operation
astro-sdk copied to clipboard

Native GCS to Databricks DeltaTable autoloader dependent on pre-set credentials in the cluster

Open tatiana opened this issue 2 years ago • 0 comments

Describe the bug Currently, the native transfer between GCS and Databricks Delta Table relies on pre-configuration on the Databricks Cluster. The current credentials set within the Astro Python SDK 1.5.2 (and our tests) are insufficient.

Version

  • Astro Python SDK: 1.5.2

To Reproduce

Remove the following Spark settings from the Databricks cluster:

  • spark.hadoop.fs.gs.auth.service.account.email
  • spark.hadoop.fs.gs.project.id
  • spark.hadoop.google.cloud.auth.service.account.enable
  • spark.hadoop.fs.gs.auth.service.account.private.key
  • spark.hadoop.fs.gs.auth.service.account.private.id

Try to run the test:

pytest tests_integration/databases/databricks_tests/test_load.py::test_delta_load_file_azure_wasb[delta-azure_blob_storage]`

See it failing:

IllegalArgumentException: clientEmail must be set if using credentials configured directly in configuration.
---------------------------------------------------------------------------
(...)
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    200                 # Hide where the exception came from that shows a non-Pythonic
    201                 # JVM exception message.
--> 202                 raise converted from None
    203             else:
    204                 raise

IllegalArgumentException: clientEmail must be set if using credentials configured directly in the configuration.

Expected behaviour Without any pre-configured configuration on the Databricks cluster, we should be able to transfer natively from GCS to Databricks, by using the information contained in the Airflow connection. Assuming that is not possible, we should find a way of having the test set up with all the necessary credentials, in a way that it does not rely on pre-configured credentials in the cluster.

tatiana avatar Mar 27 '23 01:03 tatiana