duckdb_azure icon indicating copy to clipboard operation
duckdb_azure copied to clipboard

Globbing on dfs (`abfss`) endpoint for Azure Data Lake Storage Gen2 does not work properly

Open keen85 opened this issue 5 months ago • 0 comments

I noticed some unexpected behavior when using globbing on an ADLSGen2 (Azure Data Lake Storage Gen2; Azure Storage Account). ADLSGen2 supports blob and dfs endpoint. According to documentation, globbing should be supported for both. However, I noticed some queries that work for blob but not for the dfs endpoint:

endpoint executable? statement
blob (az) OK SELECT * FROM glob('az://some_folder/DAY=2024-09-20/HOUR=9/foo')
blob (az) OK SELECT * FROM glob('az://some_folder/DAY=2024-09-20/HOUR=9/*.json')
blob (az) OK SELECT * FROM glob('az://some_folder/DAY=2024-09-20/HOUR=9/*')
blob (az) OK SELECT * FROM glob('az://some_folder/DAY=2024-09-20/HOUR=9/**')
blob (az) OK SELECT * FROM glob('az://some_folder/DAY=2024-09-20/**')
dfs (abfss) OK SELECT * FROM glob('abfss://some_folder/DAY=2024-09-20/HOUR=9/foo.json')
dfs (abfss) ERROR SELECT * FROM glob('abfss://some_folder/DAY=2024-09-20/HOUR=9/*.json')
dfs (abfss) ERROR SELECT * FROM glob('abfss://some_folder/DAY=2024-09-20/HOUR=9/*')
dfs (abfss) ERROR SELECT * FROM glob('abfss://some_folder/DAY=2024-09-20/HOUR=9/**')
dfs (abfss) OK SELECT * FROM glob('abfss://some_folder/DAY=2024-09-20/**')

ERROR: SQL Error: java.sql.SQLException: Invalid Error: 404 The specified path does not exist. The specified path does not exist.

keen85 avatar Sep 20 '24 11:09 keen85