db-benchmark
db-benchmark copied to clipboard
Published duckdb results are not reproducible
Hi. I created environment you use for benchmarks and tried to reproduce current published results.
curl http://169.254.169.254/latest/meta-data/instance-type
c6id.metal
Local disk with benchmark data is stored on local nvme disk
~/nvme/h2oai-db-benchmark$ df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/nvme2n1 1.8T 510G 1.3T 29% /home/ubuntu/nvme
lsblk | grep -v loop
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
nvme4n1 259:0 0 1000G 0 disk
├─nvme4n1p1 259:1 0 999.9G 0 part /
├─nvme4n1p14 259:2 0 4M 0 part
└─nvme4n1p15 259:3 0 106M 0 part /boot/efi
nvme2n1 259:4 0 1.7T 0 disk /home/ubuntu/nvme
nvme3n1 259:5 0 1.7T 0 disk
nvme1n1 259:6 0 1.7T 0 disk /var/lib/clickhouse
nvme0n1 259:7 0 1.7T 0 disk /nvme
Group by G1_1e9_1e2_5_0 fails with OOM for duckdb 0.8.1.3
cat run_duckdb_groupby_G1_1e9_1e2_5_0.err
Error: rapi_execute: Failed to run query
Error: Out of Memory Error: could not allocate block of size 262KB (216.2GB/216.2GB used)
Database is launched in in-memory mode and no temporary directory is specified.
Unused blocks cannot be offloaded to disk.
Launch the database with a persistent storage back-end
Or set PRAGMA temp_directory='/path/to/tmp.tmp'
Timing stopped at: 768 538.4 33.28
Execution halted
Warning messages:
1: Connection is garbage-collected, use dbDisconnect() to avoid this.
2: Database is garbage-collected, use dbDisconnect(con, shutdown=TRUE) or duckdb::duckdb_shutdown(drv) to avoid this.
~~You need to use the same version of duckdb if you want to reproduce.~~
sorry, I expected benchmark runs on 0.9.0 and latest 0.9.1
The current published results also have 0.8.1-3 erroring out on the dataset G1_1e9_1e2_5_0 (at least for the advanced questions). You can see the results here
https://duckdblabs.github.io/db-benchmark/groupby/G1_1e9_1e2_5_0_advanced.png
Can you also post the .out file? That can tell you what specific query failed.