ducklake_delete_orphaned_files fails on S3 with non‑DNS bucket, even with path‑style + explicit endpoint
What happens?
Environment
- DuckDB: 1.4.2
- Extensions: httpfs and ducklake (matching 1.4.2)
- Object storage: S3
- Bucket name: non‑DNS compliant (contains uppercase and/or underscore)
- Data prefix: s3://<BUCKET_NON_DNS>/path%20with%20spaces/prefix/
Expected With s3_url_style='path' and a region endpoint set (e.g. s3.eu-central-1.amazonaws.com), all maintenance operations—including ducklake_delete_orphaned_files—should succeed using path‑style URLs.
Actual
- Reads/writes and all other maintenance steps succeed.
- ducklake_delete_orphaned_files(...) fails with an SSL error during ListObjectsV2.
- Error shows a request path like:
'/?encoding-type=url&list-type=2&prefix=...'→ Bucket name is missing from the path, suggesting virtual-hosted-style was used.
Full error message:
Exception has occurred: IOException
IO Error: Failed to perform CHECKPOINT; in DuckLake: Failed to get files scheduled for deletion from DuckLake: SSL connection failed error for HTTP GET to '/?encoding-type=url&list-type=2&prefix=Data%20Tools%2FData%2Fparquet%2F'
LINE 2: FROM read_blob('s3://LIB/Data Tools/Data/parquet...
^
LINE 1: CALL ducklake_delete_orphaned_files('lake')
^
File "/ducklake_script.py", line 87, in <module>
lm.con.execute("CHECKPOINT;")
_duckdb.IOException: IO Error: Failed to perform CHECKPOINT; in DuckLake: Failed to get files scheduled for deletion from DuckLake: SSL connection failed error for HTTP GET to '/?encoding-type=url&list-type=2&prefix=Data%20Tools%2FData%2Fparquet%2F'
LINE 2: FROM read_blob('s3://LIB/Data Tools/Data/parquet...
^
LINE 1: CALL ducklake_delete_orphaned_files('lake')
^
To Reproduce
import duckdb
con = duckdb.connect()
# 1) Install/load httpfs; force PATH-STYLE + explicit region endpoint
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_endpoint='s3.eu-central-1.amazonaws.com'")
con.execute("SET s3_region='eu-central-1'")
con.execute("SET s3_url_style='path'")
con.execute("SET s3_use_ssl=true")
# 2) Load ducklake and attach (metadata local; data on S3 with encoded spaces)
con.execute("INSTALL ducklake; LOAD ducklake;")
meta = 'ducklake:/memory_meta/metadata.ducklake'
data = "s3://<BUCKET_NON_DNS>/path%20with%20spaces/prefix/" # non-DNS bucket name here
con.execute(f"ATTACH '{meta}' AS lake (DATA_PATH '{data}');")
con.execute("USE lake;")
# 3) Sanity check: listing via HTTPFS (path-style) should succeed
con.execute(f"SELECT COUNT(*) FROM list_files('{data}')").fetchall()
# 4) Maintenance: all succeed except orphan deletion
for stmt in [
"CALL ducklake_flush_inlined_data('lake');",
"CALL ducklake_expire_snapshots('lake');",
"CALL ducklake_merge_adjacent_files('lake');",
"CALL ducklake_rewrite_data_files('lake');",
"CALL ducklake_cleanup_old_files('lake', cleanup_all => true);"
]:
con.execute(stmt)
# 5) Repro: orphan deletion (requires ListObjectsV2) -> FAILS with SSL error
con.execute("CALL ducklake_delete_orphaned_files('lake');")
# Also fails:
# con.execute(\"CALL ducklake_delete_orphaned_files('lake', dry_run => true, older_than => now() - INTERVAL '1 week');\")
OS:
Linux - x86_64
DuckDB Version:
1.4.2
DuckLake Version:
Extension matching DuckDB 1.4.2
DuckDB Client:
Python
Hardware:
No response
Full Name:
Peter Schmidt
Affiliation:
Private
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Not applicable - the reproduction does not require a data set
Did you include all code required to reproduce the issue?
- [x] Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- [x] Yes, I have
Hi @phmu16ab I think your repro is incomplete. The error that you are getting seems to be due to 403 (forbidden) since you haven't initialized an s3 secret. This works:
import duckdb
con = duckdb.connect()
# 1) Install/load httpfs; force PATH-STYLE + explicit region endpoint
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_endpoint='s3.eu-central-1.amazonaws.com'")
con.execute("SET s3_region='eu-central-1'")
con.execute("SET s3_url_style='path'")
con.execute("SET s3_use_ssl=true")
# 2) Load ducklake and attach (metadata local; data on S3 with encoded spaces)
con.execute("INSTALL ducklake; LOAD ducklake;")
meta = 'ducklake:metadata.ducklake'
data = "s3://ducklake-test-europe/path%20with%20spaces/prefix/" # non-DNS bucket name here
con.execute("""CREATE OR REPLACE SECRET secret (
TYPE s3,
PROVIDER credential_chain,
CHAIN config
);""")
con.execute(f"ATTACH '{meta}' AS lake (DATA_PATH '{data}');")
con.execute("USE lake;")
con.execute("CREATE TABLE t as SELECT range a FROM range(10);")
# 3) Sanity check: listing via HTTPFS (path-style) should succeed
con.sql(f"SELECT COUNT(*) FROM glob('{data}')").show()
con.sql("CALL enable_logging('HTTP')")
# 4) Maintenance: all succeed except orphan deletion
for stmt in [
"CALL ducklake_flush_inlined_data('lake');",
"CALL ducklake_expire_snapshots('lake');",
"CALL ducklake_merge_adjacent_files('lake');",
"CALL ducklake_rewrite_data_files('lake');",
"CALL ducklake_cleanup_old_files('lake', cleanup_all => true);"
]:
con.execute(stmt)
con.sql("SELECT request.type, request.url, request.headers FROM duckdb_logs_parsed('HTTP')").show()
# 5) Repro: orphan deletion (requires ListObjectsV2) -> FAILS with SSL error
con.execute("CALL ducklake_delete_orphaned_files('lake');")
# Also fails:
# con.execute(\"CALL ducklake_delete_orphaned_files('lake', dry_run => true, older_than => now() - INTERVAL '1 week');\")
Could you provide an actual reproducer? And to be clear, do you say that the issue is the upper case in non-DNS?
Thanks for opening this issue in the DuckLake issue tracker! To resolve this issue, our team needs a reproducible example. This includes:
- A source code snippet which reproduces the issue.
- The snippet should be self-contained, i.e., it should contain all imports and should use relative paths instead of hard coded paths (please avoid
/Users/JohnDoe/...). - A lot of issues can be reproduced with plain SQL code executed in the DuckDB command line client. If you can provide such an example, it greatly simplifies the reproduction process and likely results in a faster fix.
- If the script needs additional data, please share the data as a CSV, JSON, or Parquet file. Unfortunately, we cannot fix issues that can only be reproduced with a confidential data set. Support contracts allow sharing confidential data with the core DuckDB team under NDA.
For more detailed guidelines on how to create reproducible examples, please visit Stack Overflow's “Minimal, Reproducible Example” page.
Hi @phmu16ab, feel free to reopen if error persists, and you can provide a reproducer.