"malloc: Heap corruption detected" after running merge_adjacent_files() on PG+S3 partitioned ducklake
I've managed to reproduce this a couple of times, but only with my data:
- Create a new ducklake (PG catalog, S3 object store partitioned on 2 fields)
- Copy in ~40GB of data from an iceberg lake (
INSERT TO ... FROM s3tables WHERE ..) - Everything works fine - at this point I can query etc and get the expected results
- run
CALL main.merge_adjacent_files()- appears to succeed but does nothing - All subsequent queries to the ducklake cause the process to crash with a malloc error below
duckdb(36187,0x16b99f000) malloc: Heap corruption detected, free list is damaged at 0x6000037c53e0
*** Incorrect guard value: 105553178690160
duckdb(36187,0x16b99f000) malloc: *** set a breakpoint in malloc_error_break to debug
zsh: abort duckdb
This happens whether I do any of these queries from my local mac or remote ubuntu.
Confusingly, I exported the contents of pg before and after, and none of the data appears different. Looking through S3, I can't see any modified files.
Happy to run this a couple of times if you let me know how to get useful debugging info out of duckdb
Thanks for the report!
Does this behavior only happen when using Postgres/S3, or does it also happen locally when using DuckDB + local storage?
Does the behavior happen after reconnecting as well? Or does calling merge_adjacent_files only influence the running process, and the behavior is fine again after reconnecting?
The behaviour happens when S3 is used as the storage - it did not appear when I used the SSD as storage (using both duckdb and postgres as the catalog).
When it happens, it spoils the ducklake completely - after restarting the process, it will crash any time the ducklake is queried.
Could you try querying the Parquet files directly? Perhaps there's a particular Parquet file that is causing issues here.
The behaviour happens when S3 is used as the storage - it did not appear when I used the SSD as storage (using both duckdb and postgres as the catalog).
When it happens, it spoils the ducklake completely - after restarting the process, it will crash any time the ducklake is queried.
I'm running into a similar issue. It appears to work fine with DATA_PATH using local storage on Mac, but it crashes frequently with malloc errors when using S3 storage.
Could you try querying the Parquet files directly? Perhaps there's a particular Parquet file that is causing issues here.
I tried doing several queries across various columns of all parquets (select avg(x) from parquet_scan('s3://.../ducklake/**');) and all the queries i tried worked. However, retrying the process described in the first post did not cause the malloc error on the day I tried (ducklake extension v 673f44d) - but I haven't had a chance to do more testing since then.
Thanks for checking. An issue we fixed upstream recently was related to auto-loading of secrets - https://github.com/duckdb/duckdb/pull/17650. That could cause crashes when starting DuckLake connected to S3 and immediately issuing a query before secrets are loaded. A work-around is to explicitly instantiate secrets by calling FROM duckdb_secrets(). Perhaps that is also what is going on here?