Segmentation fault in 1.3.0
What happens?
Hello, I am experiencing a segfault in DuckDB 1.3.0, both in the CLI and the Python bindings, that does not happen in version 1.2.0:
$ duckdb --version
v1.3.0 71c5c07cdd
$ duckdb -f last_query.sql
Segmentation fault (core dumped)
$ /home/jacopo/.duckdb/cli/1.2.1/duckdb -f last_query.sql
$ ls -lh data/location/agg_day_2025-01-19.parquet
-rw-r--r--. 1 jacopo jacopo 704K 24. Mai 11:13 data/location/agg_day_2025-01-19.parquet
the query is aggregating 344 parquet files into a single one:
copy (
select distinct on (was_there_at)
*
from read_parquet([
'/home/jacopo/projects/appbox/data/location/agg_recent_2025-01-19T08:19:39.parquet',
'/home/jacopo/projects/appbox/data/location/agg_recent_2025-01-19T08:26:19.parquet',
[...]
, union_by_name=TRUE)
)
to 'data/location/agg_day_2025-01-19.parquet' (
FORMAT PARQUET,
PARQUET_VERSION V2
);
This is one of the input files (extension changed for GH), they all have the same structure and similar sizes
agg_recent_2025-01-19T08:19:39.parquet.txt
This is the coredump I retrieved from journalctl:
To Reproduce
unfortunately I cannot reproduce it right now because this code aggregates the files and deletes them, and while testing it with DuckDB 1.2.1 it succeeded and deleted the input files.
Restoring version 1.3.0 and trying it with another dataset it works, it seems the issue may be with the amount of input files.
OS:
linux, Fedora workstation 42
DuckDB Version:
1.3.0
DuckDB Client:
CLI and Python
Hardware:
amd64
Full Name:
Jacopo Farina
Affiliation:
none (I work at Flixbus and we use duckdb, but this is an issue on a personal project)
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
- [x] Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- [x] Yes, I have
Thanks for opening this issue in the DuckDB issue tracker! To resolve this issue, our team needs a reproducible example. This includes:
- A source code snippet which reproduces the issue.
- The snippet should be self-contained, i.e., it should contain all imports and should use relative paths instead of hard coded paths (please avoid
/Users/JohnDoe/...). - A lot of issues can be reproduced with plain SQL code executed in the DuckDB command line client. If you can provide such an example, it greatly simplifies the reproduction process and likely results in a faster fix.
- If the script needs additional data, please share the data as a CSV, JSON, or Parquet file. Unfortunately, we cannot fix issues that can only be reproduced with a confidential data set. Support contracts allow sharing confidential data with the core DuckDB team under NDA.
For more detailed guidelines on how to create reproducible examples, please visit Stack Overflow's “Minimal, Reproducible Example” page.
I was able to reproduce it with a smaller sample without too much personal data, attached:
running duckdb -f last_query.sql generates a segmentation fault (stack trace attached):
while it works fine with DuckDB CLI 1.2.1.
I'm now trying to build the latest commit to verify whether it happens there too
With the latest build (a8a377580c) in debug mode I get no segmentation fault but rather this error:
49% ▕█████████████████████████████▍ ▏ /home/jacopo/projects/duckdb/extension/parquet/include/parquet_bss_encoder.hpp:30:48: runtime error: store to null pointer of type 'data_t'
I tried commit 843ea85218, which was the last in the branch v1.3-ossivalis and includes the PR mentioned above, and in that branch the error is fixed, both using V1 and version V2, so this seems to be the same issue
We have hit this too (sounds like it anyway) with v1.3.1 (v1.2.1 works fine).
with: duckdb-17633.sql
INSTALL spatial;
LOAD spatial;
CREATE TABLE places AS
SELECT
id as uuid,
version,
confidence,
CAST(names AS JSON) AS names,
CAST(categories AS JSON) AS categories,
CAST(websites AS JSON) AS websites,
CAST(socials AS JSON) AS socials,
CAST(emails AS JSON) AS emails,
CAST(phones AS JSON) AS phones,
CAST(brand AS JSON) AS brand,
CAST(addresses AS JSON) AS addresses,
CAST(sources AS JSON) AS sources,
CAST(bbox AS JSON) as bbox,
geometry AS geometry
FROM read_parquet('s3://overturemaps-us-west-2/release/2025-05-21.0/theme=places/type=place/*', hive_partitioning=true)
WHERE bbox.xmin BETWEEN 144.93 AND 144.94
AND bbox.ymin BETWEEN -37.79 AND -37.78;
and running:
duckdb -f duckdb-17633.sql -c "COPY places TO places.csv (FORMAT csv)"
reproduces it on both linux (amd64) and macOS (arm64)
I think original issue by @jacopofar has been solved in 1.3.1, and follow up issue (that is possibly separate) by @bed42 seems also solved in duckdb 1.3.1 with spatial on version fd68ec0 (do UPDATE EXTENSIONS (spatial) if commit is not the right one).
Please reopen if original issue still persist, or please open an issue to duckdb/duckdb-spatial repo on the spatial related problem. Thanks
confirming updating the spatial extension to fd68ec0 (from 95ed129) has resolved the crash.
Many thanks @carlopi !