Daft
Daft copied to clipboard
read_deltalake on Unity Catalog Table from Databricks has invalid region configuration
I am trying to read a table stored in Unity Catalog (external data access enabled) in Databricks and I am getting "OSError: Generic S3 error: Received redirect without LOCATION, this normally indicates an incorrectly configured region", even though the region is explicitly defined in io_config:
import daft
from daft.unity_catalog import UnityCatalog
from daft.io import IOConfig, S3Config
from dotenv import dotenv_values
env_cfg = dotenv_values()
unity = UnityCatalog(
endpoint=env_cfg.get('DBX_ENDPOINT'),
token=env_cfg.get('DBX_TOKEN'),
)
print(unity.list_catalogs())# See all available catalogs, works OK
print(unity.list_schemas('test_catalog')) # See available schemas in a given catalog, works OK
print(unity.list_tables('test_catalog.test_schema')) # See available tables in a given schema, works OK
cfg = unity.load_table('test_catalog.test_schema.test_table') # works OK
io_config = IOConfig(s3=S3Config(region_name='eu-central-1'))
cfg_df = daft.read_deltalake(cfg, io_config=io_config) # here is, where the error happens
And I am getting this output
[...catalogs...] [...schemas...] [...tables...]
With this error
failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: io error: error trying to connect: tcp connect error: Connection refused (os error 111): tcp connect error: Connection refused (os error 111): Connection refused (os error 111) (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Io, source: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })), connection: Unknown } }) }))
failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: io error: error trying to connect: tcp connect error: Connection refused (os error 111): tcp connect error: Connection refused (os error 111): Connection refused (os error 111) (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Io, source: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })), connection: Unknown } }) }))
S3 Credentials not provided or found when making client for us-east-1! Reverting to Anonymous mode. the credential provider was not enabled
[2024-09-23T13:10:00Z WARN aws_config::imds::region] failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: io error: error trying to connect: tcp connect error: Connection refused (os error 111): tcp connect error: Connection refused (os error 111): Connection refused (os error 111) (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Io, source: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })), connection: Unknown } }) }))
[2024-09-23T13:10:00Z WARN aws_config::imds::region] failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: io error: error trying to connect: tcp connect error: Connection refused (os error 111): tcp connect error: Connection refused (os error 111): Connection refused (os error 111) (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Io, source: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })), connection: Unknown } }) }))
Traceback (most recent call last):
File "poc_uc_daft.py", line 29, in
Desktop (please complete the following information):
- Windows 11
Am doing something wrong or is it a bug? Is there a workaround? May it be related to this 2 days old issue? https://github.com/Eventual-Inc/Daft/issues/2879