duckdb_azure
duckdb_azure copied to clipboard
Globbing on dfs (`abfss`) endpoint for Azure Data Lake Storage Gen2 does not work properly
I noticed some unexpected behavior when using globbing on an ADLSGen2 (Azure Data Lake Storage Gen2; Azure Storage Account).
ADLSGen2 supports blob and dfs endpoint.
According to documentation, globbing should be supported for both.
However, I noticed some queries that work for blob
but not for the dfs
endpoint:
endpoint | executable? | statement |
---|---|---|
blob (az ) |
OK | SELECT * FROM glob('az://some_folder/DAY=2024-09-20/HOUR=9/foo') |
blob (az ) |
OK | SELECT * FROM glob('az://some_folder/DAY=2024-09-20/HOUR=9/*.json') |
blob (az ) |
OK | SELECT * FROM glob('az://some_folder/DAY=2024-09-20/HOUR=9/*') |
blob (az ) |
OK | SELECT * FROM glob('az://some_folder/DAY=2024-09-20/HOUR=9/**') |
blob (az ) |
OK | SELECT * FROM glob('az://some_folder/DAY=2024-09-20/**') |
dfs (abfss ) |
OK | SELECT * FROM glob('abfss://some_folder/DAY=2024-09-20/HOUR=9/foo.json') |
dfs (abfss ) |
ERROR | SELECT * FROM glob('abfss://some_folder/DAY=2024-09-20/HOUR=9/*.json') |
dfs (abfss ) |
ERROR | SELECT * FROM glob('abfss://some_folder/DAY=2024-09-20/HOUR=9/*') |
dfs (abfss ) |
ERROR | SELECT * FROM glob('abfss://some_folder/DAY=2024-09-20/HOUR=9/**') |
dfs (abfss ) |
OK | SELECT * FROM glob('abfss://some_folder/DAY=2024-09-20/**') |
ERROR: SQL Error: java.sql.SQLException: Invalid Error: 404 The specified path does not exist. The specified path does not exist.