duckdb icon indicating copy to clipboard operation
duckdb copied to clipboard

Segmentation fault in 1.3.0

Open jacopofar opened this issue 7 months ago • 4 comments

What happens?

Hello, I am experiencing a segfault in DuckDB 1.3.0, both in the CLI and the Python bindings, that does not happen in version 1.2.0:

$ duckdb --version
v1.3.0 71c5c07cdd
$ duckdb -f last_query.sql
Segmentation fault (core dumped)
$ /home/jacopo/.duckdb/cli/1.2.1/duckdb -f last_query.sql
$ ls -lh data/location/agg_day_2025-01-19.parquet
-rw-r--r--. 1 jacopo jacopo 704K 24. Mai 11:13 data/location/agg_day_2025-01-19.parquet

the query is aggregating 344 parquet files into a single one:

copy (
    select distinct on (was_there_at)
    *
    from read_parquet([
'/home/jacopo/projects/appbox/data/location/agg_recent_2025-01-19T08:19:39.parquet',
'/home/jacopo/projects/appbox/data/location/agg_recent_2025-01-19T08:26:19.parquet',
[...]
, union_by_name=TRUE)
)
to 'data/location/agg_day_2025-01-19.parquet' (
    FORMAT PARQUET,
    PARQUET_VERSION V2
);

This is one of the input files (extension changed for GH), they all have the same structure and similar sizes

agg_recent_2025-01-19T08:19:39.parquet.txt

This is the coredump I retrieved from journalctl:

coredump.txt

To Reproduce

unfortunately I cannot reproduce it right now because this code aggregates the files and deletes them, and while testing it with DuckDB 1.2.1 it succeeded and deleted the input files.

Restoring version 1.3.0 and trying it with another dataset it works, it seems the issue may be with the amount of input files.

OS:

linux, Fedora workstation 42

DuckDB Version:

1.3.0

DuckDB Client:

CLI and Python

Hardware:

amd64

Full Name:

Jacopo Farina

Affiliation:

none (I work at Flixbus and we use duckdb, but this is an issue on a personal project)

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • [x] Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • [x] Yes, I have

jacopofar avatar May 24 '25 09:05 jacopofar

Thanks for opening this issue in the DuckDB issue tracker! To resolve this issue, our team needs a reproducible example. This includes:

  • A source code snippet which reproduces the issue.
  • The snippet should be self-contained, i.e., it should contain all imports and should use relative paths instead of hard coded paths (please avoid /Users/JohnDoe/...).
  • A lot of issues can be reproduced with plain SQL code executed in the DuckDB command line client. If you can provide such an example, it greatly simplifies the reproduction process and likely results in a faster fix.
  • If the script needs additional data, please share the data as a CSV, JSON, or Parquet file. Unfortunately, we cannot fix issues that can only be reproduced with a confidential data set. Support contracts allow sharing confidential data with the core DuckDB team under NDA.

For more detailed guidelines on how to create reproducible examples, please visit Stack Overflow's “Minimal, Reproducible Example” page.

duckdblabs-bot avatar May 25 '25 08:05 duckdblabs-bot

I was able to reproduce it with a smaller sample without too much personal data, attached:

data_debug.tar.gz

running duckdb -f last_query.sql generates a segmentation fault (stack trace attached):

stacktrace.txt

while it works fine with DuckDB CLI 1.2.1.

I'm now trying to build the latest commit to verify whether it happens there too

jacopofar avatar Jun 04 '25 15:06 jacopofar

With the latest build (a8a377580c) in debug mode I get no segmentation fault but rather this error:

 49% ▕█████████████████████████████▍                              ▏ /home/jacopo/projects/duckdb/extension/parquet/include/parquet_bss_encoder.hpp:30:48: runtime error: store to null pointer of type 'data_t'

jacopofar avatar Jun 04 '25 16:06 jacopofar

I tried commit 843ea85218, which was the last in the branch v1.3-ossivalis and includes the PR mentioned above, and in that branch the error is fixed, both using V1 and version V2, so this seems to be the same issue

jacopofar avatar Jun 04 '25 18:06 jacopofar

We have hit this too (sounds like it anyway) with v1.3.1 (v1.2.1 works fine).

with: duckdb-17633.sql

INSTALL spatial;
LOAD spatial;

CREATE TABLE places AS
SELECT
    id as uuid,
    version,
    confidence,
    CAST(names AS JSON) AS names,
    CAST(categories AS JSON) AS categories,
    CAST(websites AS JSON) AS websites,
    CAST(socials AS JSON) AS socials,
    CAST(emails AS JSON) AS emails,
    CAST(phones AS JSON) AS phones,
    CAST(brand AS JSON) AS brand,
    CAST(addresses AS JSON) AS addresses,
    CAST(sources AS JSON) AS sources,
    CAST(bbox AS JSON) as bbox,
    geometry AS geometry
FROM read_parquet('s3://overturemaps-us-west-2/release/2025-05-21.0/theme=places/type=place/*', hive_partitioning=true)
WHERE bbox.xmin BETWEEN 144.93 AND 144.94
  AND bbox.ymin BETWEEN -37.79 AND -37.78;

and running:

duckdb -f duckdb-17633.sql -c "COPY places TO places.csv (FORMAT csv)"

reproduces it on both linux (amd64) and macOS (arm64)

bed42 avatar Jun 24 '25 11:06 bed42

I think original issue by @jacopofar has been solved in 1.3.1, and follow up issue (that is possibly separate) by @bed42 seems also solved in duckdb 1.3.1 with spatial on version fd68ec0 (do UPDATE EXTENSIONS (spatial) if commit is not the right one).

Please reopen if original issue still persist, or please open an issue to duckdb/duckdb-spatial repo on the spatial related problem. Thanks

carlopi avatar Jul 01 '25 09:07 carlopi

confirming updating the spatial extension to fd68ec0 (from 95ed129) has resolved the crash.

Many thanks @carlopi !

bed42 avatar Jul 01 '25 12:07 bed42