duckdb_azure icon indicating copy to clipboard operation
duckdb_azure copied to clipboard

Can't mount specific file type via Azure extension

Open andreypanchenko opened this issue 7 months ago • 1 comments

I tried to mount Azure blob storage as a file system in order to read one specific file from AzureBlobStorage. I used another extension, "spatial," which can read and open the ".gdbtable" format for reading files.

con = duckdb.connect(
            database="/tmp/quack.db",
            config={
                "threads": 8,
                "memory_limit": "4GB",
                "temp_directory": "/tmp/",
                "preserve_insertion_order": False,
                "extension_directory": "/tmp/"
            }
        )
       
con.install_extension("spatial")
con.execute("LOAD spatial;")
con.install_extension("azure")
con.execute("LOAD azure;")
con.execute("""CREATE SECRET secret (
    TYPE AZURE,
    CONNECTION_STRING 'DefaultEndpointsProtocol=https;AccountName=redacted;AccountKey=redacted'
);""")

Here, I want to read this file directly to avoid downloading it to the worker and just put all the data into a duckdb table

con.sql(query='CREATE OR REPLACE TABLE GPKG_FILE AS SELECT * FROM ST_Read("abfss://redacted/dat-redacted/raw_data/v55/a00000007.gdbtable")')

The error traceback

Traceback (most recent call last):
  File "/Users/redacted/Library/Caches/pypoetry/virtualenvs/com-ing-connector-XVl8hQFI-py3.11/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-29-c75b6f8e2a1a>", line 1, in <module>
    con.sql(query='CREATE OR REPLACE TABLE GPKG_FILE AS SELECT * FROM ST_Read("abfss://redacted/dat-redacted/raw_data/v55/a00000007.gdbtable")')
duckdb.duckdb.NotImplementedException: Not implemented Error: AzureDfsStorageFileSystem: FileExists is not implemented!

Also, I have the same with

con.sql(query='CREATE OR REPLACE TABLE GPKG_FILE AS SELECT * FROM ST_Read("az://dat-redacted/raw_data/v55/a00000007.gdbtable")')

The error traceback

Traceback (most recent call last):
  File "/Users/redacted/Library/Caches/pypoetry/virtualenvs/com-ing-connector-XVl8hQFI-py3.11/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-28-6b82186ab390>", line 1, in <module>
    con.sql(query='CREATE OR REPLACE TABLE GPKG_FILE AS SELECT * FROM ST_Read("az://dat-redacted/raw_data/v55/a00000007.gdbtable")')
duckdb.duckdb.NotImplementedException: Not implemented Error: AzureBlobStorageFileSystem: DirectoryExists is not implemented!

andreypanchenko avatar Jul 29 '24 11:07 andreypanchenko