delta-rs icon indicating copy to clipboard operation
delta-rs copied to clipboard

Support storage_option 'linked_service' (Azure Data Lake Gen2 + Azure Synapse)

Open keen85 opened this issue 2 years ago • 3 comments

Description

Azure Synapse Analytics allows execution of Python notebooks in the Azure Cloud. Linked services can be specified in Azure Synapse Analytics basically representing a connection string e.g. to an Azure Storage Account / Azure Data Lake Gen2.

These linked services can be used in sotrage_options when trying to read from ADLS Gen2 via Pandas:

import pandas
csv_fullpath_adls = f"abfss://{container_name}@{storage_account_name}.dfs.core.windows.net/sales/sales_records.csv"
linked_service_name = 'name of the linked service'
df = pandas.read_csv(csv_fullpath_adls, storage_options={'linked_service': linked_service_name})
df.info()

When trying to access a Delta table analogously using delta-rs, this fails currently (deltalake-0.7.0):

from deltalake import DeltaTable
delta_fullpath_adls = f"abfss://{container_name}@{storage_account_name}.dfs.core.windows.net/sales/sales_records/"
linked_service_name = 'name of the linked service'
dt = DeltaTable(table_uri=delta_fullpath_adls, storage_options={'linked_service': linked_service_name})
dt.version()
---------------------------------------------------------------------------
PyDeltaTableError                         Traceback (most recent call last)
<ipython-input-13-337fae1a> in <module>
      1 from deltalake import DeltaTable
----> 2 dt = DeltaTable(table_uri=delta_fullpath_adls, storage_options={'linked_service': linked_service_name})
      3 dt.version()

~/cluster-env/clonedenv/lib/python3.8/site-packages/deltalake/table.py in __init__(self, table_uri, version, storage_options, without_files)
    119         """
    120         self._storage_options = storage_options
--> 121         self._table = RawDeltaTable(
    122             table_uri,
    123             version=version,

PyDeltaTableError: Failed to read delta log object: Generic MicrosoftAzure error: At least one authorization option must be specified

These are the currently supported storage_options for Azure Storage: https://github.com/delta-io/delta-rs/blob/17999d24a58fb4c98c6280b9e57842c346b4603a/rust/src/builder.rs#L524-L539

I'd like to propose allowing passing through linked_service storage option.

Use Case Accessing Delta Lake table residing in Azure Data Lake Gen2 using Azure Synapse Analytics.

Related Issue(s)

  • https://github.com/delta-io/delta-rs/issues/600
  • https://github.com/delta-io/delta-rs/issues/838

keen85 avatar Feb 17 '23 18:02 keen85

Hi @keen85 - thanks for the request. I was not aware of this option thus far.

It seems, you have linked to some outdated options. in the latest versions the supported authorization methods as well as the config keys were updated. Essentially the options defined in the underlying object store. https://docs.rs/object_store/0.5.4/object_store/azure/enum.AzureConfigKey.html#variants

I could not find a documentation what the linked_service option actually does, but I'll keep looking. In the meantime for a quick scan of the link you provided, it seems you can also use some other supported ways of authenticating - e.g. client_id / client_secret or sas tokens.

roeap avatar Feb 17 '23 19:02 roeap

Hi @roeap, the beauty of using the linked_service is that it works with Active Directory authentication transparently - no need for the user to specify any credentials or secrets in the code.

I also did some more digging but did not find more examples or any proper documentation than the one I mentioned earlier. Maybe this is some sort of proprietary storage_options that Microsoft build into Synapse Analytics?

keen85 avatar Mar 03 '23 17:03 keen85

@keen85 please make an issue for this upstream: https://github.com/apache/arrow-rs

ion-elgreco avatar Apr 06 '24 00:04 ion-elgreco

Closing this as it's out of our control

ion-elgreco avatar Aug 19 '24 19:08 ion-elgreco