duckdb-rs
duckdb-rs copied to clipboard
Error ("assertion failed") with very large select clause in query
We are generating queries to read very wide parquet files as part of a Census data extraction service. Through experimentation I've determined the limit of named columns in a select clause is 16256. When this is exceeded we get
cli_extractor: /home/ccd/nhgis-extract-engine/rust/target/release/build/libduckdb-sys-14f8e1593a01c721/out/duckdb/src/common/types/row/row_data_collection.cpp:82: duckdb::v
ector<duckdb::BufferHandle> duckdb::RowDataCollection::Build(duckdb::idx_t, duckdb::data_t**, duckdb::idx_t*, const duckdb::SelectionVector*): Assertion `new_block.count >
0' failed.
We realize this is a very large number of columns, and 99.5% of our workload is well under 16256 and runs very well. It would be nice if we could get a good error message, or have a way to increase the maximum width of a result with a setting.
It looks like there's a problem with the memory allocation code, and the actual physical memory of our machines aren't anywhere near used up when I monitor the process. When the column number is 14000 the memory use is pretty low for example.
I can reproduce the issue on version 1.1.1 (the one bundled with the Rust package.) With the CLI I can get a similar problem on v1.1.1 and v1.1.3. With the CLI it tries to dump to a temp file and then just gets stuck. The size of the temp file in .tmp is just 256kb.
When I test on the nightly build downloaded today I get a different error:
ccd@gp2000:~/nhgis-extract-engine/rust$ ~/duckdb/duckdb < test_query.sql
Floating point exception (core dumped)
ccd@gp2000:~/nhgis-extract-engine/rust$
Due to the nature of the problem the query and data file to reproduce the issue are both extremely large but I'm willing to share.
After looking at the other duckdb-rs issues I realized this may belong in the general issues list. The main difference with Rust is that we get the hard stop on the assertion failure which we wouldn't see when using the CLI application.
Thanks for the report!
I'm going to move this to duckdb/duckdb as it appears to be a core issue indeed.
My quick (and possibly wrong) analysis:
DuckDB doesn't handle the case where a single row is wider than the block size, leading to:
- block_capacity gets calculated as 0
- AppendToBlock returns 0 → new_block.count remains 0
- Assertion failure at https://github.com/duckdb/duckdb/blob/v1.3.2/src/common/types/row/row_data_collection.cpp#L82
It seems the variable-size entry path already handles this case with the special resizing logic:
https://github.com/duckdb/duckdb/blob/v1.3.2/src/common/types/row/row_data_collection.cpp#L22
However, the fixed-size entry path lacks this protection, which is why the assertion fails when dealing with rows wider than the block size.
https://github.com/duckdb/duckdb/blob/v1.3.2/src/common/types/row/row_data_collection.cpp#L37
@ccdavis I wasn't able to reproduce the issue myself. The provided test_query.txt is incomplete as it stands. Could you please provide a complete reproducer?
Closing as stale due to inactivity. Feel free to re-open.