chdb icon indicating copy to clipboard operation
chdb copied to clipboard

Add chDB to the DuckDB benchmark(former h2o)

Open auxten opened this issue 5 months ago • 5 comments

Add chDB to the benchmark(former h2o) https://duckdblabs.github.io/db-benchmark/

auxten avatar Jul 26 '25 13:07 auxten

I've started working on it. Please advice which approach is better to use.

There are several options for CSV query implementations

  1. direct chdb.query
  2. create connection and then conn.query
  3. create a session
  4. use dbapi connection

I'm currently thinking about session

cyrusmsk avatar Jul 27 '25 09:07 cyrusmsk

In current version. Connection based API (No.2) is the fastest

auxten avatar Jul 27 '25 10:07 auxten

Hi @auxten @wudidapaopao I've prepared drafts calculations for the benchmark.

Test were run on local machine - macOS 15.5, M1 Pro (32 Gb RAM) Some queries are significantly slower than DuckDB

I've used similar table structures and queries as they were used in ClickHouse solution. To add results to the official benchmark, they are asking to run them also on large AWS (quote below) and will be great to be able to add results to official repo as well, but I will need some help from your team

The benchmark will now be updated upon request. A request can be made by creating a PR with a combination of the following.

The PR must include

updates to the time.csv and log.csv files of a run on a c6id.metal machine. If you are re-enabling a query for a solution, you can just include new times and logs for the query, however, the version must match currently reported version.

It would be awesome if you will be able to review the code related to the chdb in my fork: https://github.com/cyrusmsk/db-benchmark/pull/1 Keep in mind that currently 2 separate branches used for chdb.session and chdb.connect approaches (but the difference is only in the logic of conn object creation - 2 lines of code)

Join comparison Image

Group-by comparison Image

cyrusmsk avatar Aug 10 '25 11:08 cyrusmsk

PR is here https://github.com/duckdblabs/db-benchmark/pull/131

auxten avatar Sep 18 '25 02:09 auxten

@auxten @wudidapaopao the results were added! https://duckdblabs.github.io/db-benchmark/ The author of the repo tested on 3.6.0 though.. maybe on 3.7.0 results will be a bit better. Also some issues were observed when it tried to use 50GB file.

But in general this task could be closed I think now

cyrusmsk avatar Oct 27 '25 20:10 cyrusmsk