delta-rs
delta-rs copied to clipboard
Support storage_option 'linked_service' (Azure Data Lake Gen2 + Azure Synapse)
Description
Azure Synapse Analytics allows execution of Python notebooks in the Azure Cloud. Linked services can be specified in Azure Synapse Analytics basically representing a connection string e.g. to an Azure Storage Account / Azure Data Lake Gen2.
These linked services can be used in sotrage_options
when trying to read from ADLS Gen2 via Pandas:
import pandas
csv_fullpath_adls = f"abfss://{container_name}@{storage_account_name}.dfs.core.windows.net/sales/sales_records.csv"
linked_service_name = 'name of the linked service'
df = pandas.read_csv(csv_fullpath_adls, storage_options={'linked_service': linked_service_name})
df.info()
When trying to access a Delta table analogously using delta-rs, this fails currently (deltalake-0.7.0):
from deltalake import DeltaTable
delta_fullpath_adls = f"abfss://{container_name}@{storage_account_name}.dfs.core.windows.net/sales/sales_records/"
linked_service_name = 'name of the linked service'
dt = DeltaTable(table_uri=delta_fullpath_adls, storage_options={'linked_service': linked_service_name})
dt.version()
---------------------------------------------------------------------------
PyDeltaTableError Traceback (most recent call last)
<ipython-input-13-337fae1a> in <module>
1 from deltalake import DeltaTable
----> 2 dt = DeltaTable(table_uri=delta_fullpath_adls, storage_options={'linked_service': linked_service_name})
3 dt.version()
~/cluster-env/clonedenv/lib/python3.8/site-packages/deltalake/table.py in __init__(self, table_uri, version, storage_options, without_files)
119 """
120 self._storage_options = storage_options
--> 121 self._table = RawDeltaTable(
122 table_uri,
123 version=version,
PyDeltaTableError: Failed to read delta log object: Generic MicrosoftAzure error: At least one authorization option must be specified
These are the currently supported storage_options for Azure Storage: https://github.com/delta-io/delta-rs/blob/17999d24a58fb4c98c6280b9e57842c346b4603a/rust/src/builder.rs#L524-L539
I'd like to propose allowing passing through linked_service
storage option.
Use Case Accessing Delta Lake table residing in Azure Data Lake Gen2 using Azure Synapse Analytics.
Related Issue(s)
- https://github.com/delta-io/delta-rs/issues/600
- https://github.com/delta-io/delta-rs/issues/838
Hi @keen85 - thanks for the request. I was not aware of this option thus far.
It seems, you have linked to some outdated options. in the latest versions the supported authorization methods as well as the config keys were updated. Essentially the options defined in the underlying object store. https://docs.rs/object_store/0.5.4/object_store/azure/enum.AzureConfigKey.html#variants
I could not find a documentation what the linked_service
option actually does, but I'll keep looking. In the meantime for a quick scan of the link you provided, it seems you can also use some other supported ways of authenticating - e.g. client_id / client_secret or sas tokens.
Hi @roeap,
the beauty of using the linked_service
is that it works with Active Directory authentication transparently - no need for the user to specify any credentials or secrets in the code.
I also did some more digging but did not find more examples or any proper documentation than the one I mentioned earlier. Maybe this is some sort of proprietary storage_options
that Microsoft build into Synapse Analytics?
@keen85 please make an issue for this upstream: https://github.com/apache/arrow-rs
Closing this as it's out of our control