duckdb_azure icon indicating copy to clipboard operation
duckdb_azure copied to clipboard

Unable to query multiple files on Azure using container level sas token

Open erik-farmer opened this issue 7 months ago • 0 comments

What happens?

When querying for multiple files az://.blob.core.windows.net//path/to/blobs/*.json an exception is raised:

duckdb.duckdb.IOException: IO Error: AzureStorageFileSystem Read to az://<account>.blob.core.windows.net/<container>/path/to/blobs/*.json failed with NoAuthenticationInformation Reason Phrase: Server failed to authenticate the request. Please refer to the information in the www-authenticate header.

This exception is not raised when pointing to a specific blob (see example)

The SAS token is created using the following guide: https://learn.microsoft.com/en-us/azure/ai-services/translator/document-translation/how-to-guides/create-sas-tokens?tabs=Containers

and all permissions are clicked (read/write/list/etc)

To Reproduce

Method 1

import duckdb
from adlfs.spec import AzureBlobFileSystem


fs = AzureBlobFileSystem(
            account_name='',
            container_name='',  # tried with and without this param
            sas_token='mySasToken',
        )
print(fs.glob("<container_name>/")). # works
print(fs.ls("<container_name>/")). # works
connection = duckdb.connect()
connection.register_filesystem(fs)

data = connection.sql("""
SELECT *
FROM read_json('az://<account_name>.blob.core.windows.net/<container>/path/to/specificFile.json');
""")# works

data = connection.sql("""
SELECT *
FROM read_json('az://<account_name>.blob.core.windows.net/<container>/path/to/multiple/files/*.json');
""") # raises IOException

Method 2

import duckdb


duckdb.execute("""
INSTALL azure;
LOAD azure;
""")

duckdb.execute("""
CREATE SECRET secret1 (
TYPE AZURE,
CONNECTION_STRING 'mySasToken'
);
""")

connection = duckdb.connect()
data = connection.sql("""
SELECT *
FROM read_json('az://<account_name>.blob.core.windows.net/<container>/path/to/specificFile.json');
""")
``` # works

data = connection.sql("""
SELECT *
FROM read_json('az://<account_name>.blob.core.windows.net/<container>/path/to/multiple/files/*.json');
""")
``` # raises IOException

OS:

arm64 (Apple M1)

DuckDB Version:

0.10.2

DuckDB Client:

Python

Full Name:

Erik Farmer

Affiliation:

PepsiCo

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Not applicable - the reproduction does not require a data set

Did you include all code required to reproduce the issue?

  • [X] Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • [X] Yes, I have

erik-farmer avatar Jul 24 '24 20:07 erik-farmer