Daft icon indicating copy to clipboard operation
Daft copied to clipboard

read_deltalake on Unity Catalog Table from Databricks has invalid region configuration

Open lukaskratoch opened this issue 5 months ago • 9 comments

I am trying to read a table stored in Unity Catalog (external data access enabled) in Databricks and I am getting "OSError: Generic S3 error: Received redirect without LOCATION, this normally indicates an incorrectly configured region", even though the region is explicitly defined in io_config:

import daft
from daft.unity_catalog import UnityCatalog
from daft.io import IOConfig, S3Config
from dotenv import dotenv_values

env_cfg = dotenv_values()

unity = UnityCatalog(
    endpoint=env_cfg.get('DBX_ENDPOINT'),
    token=env_cfg.get('DBX_TOKEN'),
)

print(unity.list_catalogs())# See all available catalogs, works OK
print(unity.list_schemas('test_catalog')) # See available schemas in a given catalog, works OK
print(unity.list_tables('test_catalog.test_schema')) # See available tables in a given schema, works OK

cfg = unity.load_table('test_catalog.test_schema.test_table') # works OK
io_config = IOConfig(s3=S3Config(region_name='eu-central-1'))
cfg_df = daft.read_deltalake(cfg, io_config=io_config) # here is, where the error happens

And I am getting this output

[...catalogs...] [...schemas...] [...tables...]

With this error

failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: io error: error trying to connect: tcp connect error: Connection refused (os error 111): tcp connect error: Connection refused (os error 111): Connection refused (os error 111) (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Io, source: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })), connection: Unknown } }) })) failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: io error: error trying to connect: tcp connect error: Connection refused (os error 111): tcp connect error: Connection refused (os error 111): Connection refused (os error 111) (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Io, source: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })), connection: Unknown } }) })) S3 Credentials not provided or found when making client for us-east-1! Reverting to Anonymous mode. the credential provider was not enabled [2024-09-23T13:10:00Z WARN aws_config::imds::region] failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: io error: error trying to connect: tcp connect error: Connection refused (os error 111): tcp connect error: Connection refused (os error 111): Connection refused (os error 111) (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Io, source: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })), connection: Unknown } }) })) [2024-09-23T13:10:00Z WARN aws_config::imds::region] failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: io error: error trying to connect: tcp connect error: Connection refused (os error 111): tcp connect error: Connection refused (os error 111): Connection refused (os error 111) (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Io, source: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })), connection: Unknown } }) })) Traceback (most recent call last): File "poc_uc_daft.py", line 29, in cfg_df = daft.read_deltalake(cfg, io_config=io_config) File "/c/Users/xxx/Projects/xxx/venv/lib/python3.8/site-packages/daft/api_annotations.py", line 39, in _wrap return timed_func(*args, **kwargs) File "/c/Users/xxx/Projects/xxx/venv/lib/python3.8/site-packages/daft/analytics.py", line 228, in tracked_fn result = fn(*args, **kwargs) File "/c/Users/xxx/Projects/xxx/venv/lib/python3.8/site-packages/daft/io/_deltalake.py", line 74, in read_deltalake delta_lake_operator = DeltaLakeScanOperator(table_uri, storage_config=storage_config) File "/c/Users/xxx/Projects/xxx/venv/lib/python3.8/site-packages/daft/delta_lake/delta_lake_scan.py", line 63, in init self._table = DeltaTable( File "/c/Users/xxx/Projects/xxx/venv/lib/python3.8/site-packages/deltalake/table.py", line 380, in init self._table = RawDeltaTable( OSError: Generic S3 error: Received redirect without LOCATION, this normally indicates an incorrectly configured region

Desktop (please complete the following information):

  • Windows 11

Am doing something wrong or is it a bug? Is there a workaround? May it be related to this 2 days old issue? https://github.com/Eventual-Inc/Daft/issues/2879

lukaskratoch avatar Sep 24 '24 07:09 lukaskratoch