duckdb-rs icon indicating copy to clipboard operation
duckdb-rs copied to clipboard

Error ("assertion failed") with very large select clause in query

Open ccdavis opened this issue 10 months ago • 1 comments

We are generating queries to read very wide parquet files as part of a Census data extraction service. Through experimentation I've determined the limit of named columns in a select clause is 16256. When this is exceeded we get

cli_extractor: /home/ccd/nhgis-extract-engine/rust/target/release/build/libduckdb-sys-14f8e1593a01c721/out/duckdb/src/common/types/row/row_data_collection.cpp:82: duckdb::v
ector<duckdb::BufferHandle> duckdb::RowDataCollection::Build(duckdb::idx_t, duckdb::data_t**, duckdb::idx_t*, const duckdb::SelectionVector*): Assertion `new_block.count >
0' failed.

We realize this is a very large number of columns, and 99.5% of our workload is well under 16256 and runs very well. It would be nice if we could get a good error message, or have a way to increase the maximum width of a result with a setting.

It looks like there's a problem with the memory allocation code, and the actual physical memory of our machines aren't anywhere near used up when I monitor the process. When the column number is 14000 the memory use is pretty low for example.

I can reproduce the issue on version 1.1.1 (the one bundled with the Rust package.) With the CLI I can get a similar problem on v1.1.1 and v1.1.3. With the CLI it tries to dump to a temp file and then just gets stuck. The size of the temp file in .tmp is just 256kb.

When I test on the nightly build downloaded today I get a different error:

ccd@gp2000:~/nhgis-extract-engine/rust$ ~/duckdb/duckdb < test_query.sql
Floating point exception (core dumped)
ccd@gp2000:~/nhgis-extract-engine/rust$

Due to the nature of the problem the query and data file to reproduce the issue are both extremely large but I'm willing to share.

test_query.txt

ccdavis avatar Jan 28 '25 20:01 ccdavis

After looking at the other duckdb-rs issues I realized this may belong in the general issues list. The main difference with Rust is that we get the hard stop on the assertion failure which we wouldn't see when using the CLI application.

ccdavis avatar Jan 28 '25 20:01 ccdavis

Thanks for the report!

I'm going to move this to duckdb/duckdb as it appears to be a core issue indeed.

mlafeldt avatar Jul 15 '25 15:07 mlafeldt

My quick (and possibly wrong) analysis:

DuckDB doesn't handle the case where a single row is wider than the block size, leading to:

  1. block_capacity gets calculated as 0
  2. AppendToBlock returns 0 → new_block.count remains 0
  3. Assertion failure at https://github.com/duckdb/duckdb/blob/v1.3.2/src/common/types/row/row_data_collection.cpp#L82

It seems the variable-size entry path already handles this case with the special resizing logic:

https://github.com/duckdb/duckdb/blob/v1.3.2/src/common/types/row/row_data_collection.cpp#L22

However, the fixed-size entry path lacks this protection, which is why the assertion fails when dealing with rows wider than the block size.

https://github.com/duckdb/duckdb/blob/v1.3.2/src/common/types/row/row_data_collection.cpp#L37

mlafeldt avatar Jul 15 '25 15:07 mlafeldt

@ccdavis I wasn't able to reproduce the issue myself. The provided test_query.txt is incomplete as it stands. Could you please provide a complete reproducer?

mlafeldt avatar Jul 16 '25 10:07 mlafeldt

Closing as stale due to inactivity. Feel free to re-open.

mlafeldt avatar Sep 03 '25 10:09 mlafeldt