duckdb-node icon indicating copy to clipboard operation
duckdb-node copied to clipboard

Connections don't run queries concurrently

Open tronis470 opened this issue 2 years ago • 5 comments

Not sure if this is WAD, a bug, or there is some other workaround here for doing connection pooling with DuckDB in nodejs...

Using an open database, I want to run some queries concurrently across different connections. Is there a way to accomplish this?

This test script shows that queries will not run concurrently, even if run on different connections:

var duckdb = require('duckdb');

var db = new duckdb.Database(':memory:');

const connA = db.connect();
const connB = db.connect();

async function fastQuery() {
    return new Promise((resolve, reject) => {
        const t = Date.now();
        connA.all(`select 1 from range(1,10)`, (err, res) => {
            if(err) reject(err);
            else {
                resolve(Date.now() - t)
            }
        })
    })
}

async function slowQuery() {
    return new Promise((resolve, reject) => {
        const t = Date.now();
        connB.all(`select max(i) from (select 1 as i from range(1,10000000000));`, (err, res) => {
            if(err) reject(err);
            else {
                resolve(Date.now() - t)
            }
        })
    })
}

async function test() {
    console.log("Run fast query");
    console.log("Fast query time (ms): ", await fastQuery());

    console.log("Run slow query");
    console.log("Slow query time (ms): ", await slowQuery());

    console.log("Run slow and fast query concurrently");
    slowQuery().then(s => console.log("Slow query time (ms): ", s));
    fastQuery().then(f => console.log("Fast query time (ms): ", f));
}

test();

The fast query should take a few milliseconds, while the slow query should take a few seconds. If the slow and fast queries are kicked off at the same time, even on different connections, the slow query blocks the fast query from executing. This is what I get when running the script above on an M2 MBP:

Run fast query
Fast query time (ms):  2
Run slow query
Slow query time (ms):  4386
Run slow and fast query concurrently
Slow query time (ms):  4701
Fast query time (ms):  4701

If I run the fast query on a completely different Database handle, then there is no problem running the fast query concurrently with the slow query:

Run fast query
Fast query time (ms):  3
Run slow query
Slow query time (ms):  4381
Run slow and fast query concurrently
Fast query on different DB handle (ms):  1
Slow query time (ms):  4700

tronis470 avatar Oct 18 '23 23:10 tronis470

While this is presently by design, we should implement the parallelize and serialize methods to allow this to be configurable

Mause avatar Oct 20 '23 01:10 Mause

I would argue the parallel execution should be the default way, if not, the entire premise of duckdb being super fast becomes irrelevant for node

kuatroka avatar Oct 25 '23 11:10 kuatroka

I do searching for multiple query execution parallel execution through DuckDB but not finding solution, my scenerio is want to read parquet file from S3, we have member wise files no query is going to read same file so want to execute multiple member wise query as we have good system of 26GB ram and 20+ processesor but not finding solution for same.

CoolMilanShah avatar Oct 27 '23 14:10 CoolMilanShah

If the connection don't run queries concurrently, does there any way to improve multiple reading performance?

wlf061 avatar Nov 27 '23 08:11 wlf061

While this is presently by design, we should implement the parallelize and serialize methods to allow this to be configurable

@Mause Could you check if this enhancement is planned for the development pipeline? It's crucial for performance reasons to support concurrent execution.

hc-12 avatar Feb 14 '24 00:02 hc-12