orchid-orm icon indicating copy to clipboard operation
orchid-orm copied to clipboard

[Feature request] Batch queries

Open mordechaim opened this issue 8 months ago • 1 comments

There are many times in my code where I need to update multiple records at once. Since we can't update multiple records when they have different values I need to use many queries and await them all at once using Promise.all.

It would really be nice if we could have a similar db.$batch function that combines all queries into a single multi-statement query. This will result in a single round trip.

Like Promise.all, the return type should be a tuple with results in the same order as the passed queries.

mordechaim avatar May 02 '25 14:05 mordechaim

Had the same issue so I took inspiration from this article on drizzle

Here is my implementation, it is not pretty but works

let orderCase = 'CASE\n';
let stepIdCase = 'CASE\n';
const ids: string[] = [];

const orderParams: Record<string, any> = {};
const stepParams: Record<string, any> = {};

//input is an array of values sent from my frontend app
input.forEach((match, index) => {
 const idParam = `id${index}`;
 const orderParam = `new_order_value${index}`;
 const stepParam = `new_step_value${index}`;

 orderParams[idParam] = match.id;
 orderParams[orderParam] = match.newOrder;

 stepParams[idParam] = match.id;
 stepParams[stepParam] = match.newStepId;

 orderCase += `  WHEN id = $${idParam} THEN $${orderParam}\n`;
 stepIdCase += `  WHEN id = $${idParam} THEN $${stepParam}\n`;
 ids.push(match.id);
});

orderCase += '  ELSE "order"\nEND';
stepIdCase += '  ELSE "step_id"\nEND';

await db.talentMatch.whereIn('id', ids).update({
 order: () => sql({ raw: orderCase, values: orderParams }),
 stepId: () => sql({ raw: stepIdCase, values: stepParams }),
});

This generates:

UPDATE "talent_matches" SET "order" = CASE
  WHEN id = $1 THEN $2
  ELSE "order"
END, "step_id" = CASE
  WHEN id = $3 THEN $4
  ELSE "step_id"
END, "updated_at" = now() WHERE "talent_matches"."id" IN ($5)

NelsonMK avatar May 02 '25 15:05 NelsonMK

That's a good feature, I'll work on it later.

You probably don't want 9 updates to succeed and 1 of them to fail. So the new batch method should use a transaction.

Once I was doing a batch insert and encountered that partial success case, had to wrap it in a transaction and expected it to become slower. But, in contrary, I learned that transaction actually makes batch inserts/updates faster, because db can do its persisting job just once for all records rather than many times. So it's even faster in a transaction.

I expect the way suggested by Drizzle to be much less efficient, it will be interesting to compare.

romeerez avatar May 25 '25 00:05 romeerez

Is this on the agenda? It would be a killer feature.

Thank you.

(Sorry, I don't like these comments on oss, but I can't contribute myself because the ts code seems like magic to me. I can't wrap around my head what happens here)

mordechaim avatar Aug 17 '25 18:08 mordechaim

Feel free to highlight if one issue is more wanted as another.

It would really be nice if we could have a similar db.$batch function that combines all queries into a single multi-statement query. This will result in a single round trip.

Not so long ago I learned that pg driver cannot do that, but postgres can, so need to support it first.

Probably it's achievable with CTEs, but too complex to implement since different queries can return different results, and CTEs aren't great for Postgres performance.

To clarify, it's for such use case, right?

const data = { ...someData } // all of the data is known beforehand, one query does not depend on another

await db.$transaction(() => {
  await Promise.all([
    db.foo.create(data.foo),
    db.bar.find(data.bar.id).update(data.bar),
    db.baz.select('columns'),
  ])
})

By pg not supporting it I mean that despite the Promise.all being used here, it's still waiting for each query to respond before proceeding to the next, on the db driver level.

To be like:

const [foo, bar, baz] = db.$transactionAll([
    db.foo.create(data.foo),
    db.bar.find(data.bar.id).update(data.bar),
    db.baz.select('columns'),
])

Or, if you're fine with partial success:

const [foo, bar, baz] = db.$allSettled([
    db.foo.create(data.foo),
    db.bar.find(data.bar.id).update(data.bar),
    db.baz.select('columns'),
])

foo.status // same shape as in Promise.allSettled

@NelsonMK above has suggested a solution to a different change: "Update many with different values for each row"

I can add a case-when helper to a query builder, kysely has one so the design can be borrowed, if you need one, but that's a different issue.

romeerez avatar Aug 18 '25 20:08 romeerez

Oh, I see multi statement queries are not allowed by the driver. Disregard then.


To clarify, it's for such use case, right?

Precisely, I was using the name db.$batch but the exact same shape as your transactionAll

mordechaim avatar Aug 19 '25 00:08 mordechaim

I reread your comment, you are saying that node-pg doesn't allow queries in parallel even if they are separate queries? Does this mean Promise.all has zero effect?

This is bad, I was relying on this for large updates. This might explain why I kept getting Sentry warnings about N+1 queries.

Does a connection pool help? Does orchid use a pool under the hood?

mordechaim avatar Aug 22 '25 05:08 mordechaim

When not in transaction, the connection pool helps of course, Promise.all does send queries via different connections in parallel. But it's not atomic, one query can fail while the other may be committed.

pg's defaults for a pool are up to 10 connections, a connection is closed after 10 seconds of inactivity. Orchid ORM lets the defaults be as they are. You can increase both max and idleTimeoutMillis and have a positive effect.

When you begin a transaction, it takes one connection out of the pool and then all the queries of this transaction are being sent over single connection - sequentially. Even when using Promise.all in a transaction, the queries are still being sent sequentially. pg cannot send a query on the same connection before receiving a response of the previous query, but porsager/postgres can do it by using a special Postgres protocol for that.

Kudos to Sentry, I didn't know it can do that, cool. If those N+1 are not related to transactions it's not ok.

romeerez avatar Aug 30 '25 22:08 romeerez

Thank you for your detailed explanation. I did get the warnings when updating many rows with different values inside a single transaction. So this explains it all.

Sentry has special heuristics to detect this, docs.

Thanks again.

porsager/postgres can do it by using a special Postgres protocol for that

That would be lit if Orchid could support this.

mordechaim avatar Aug 31 '25 01:08 mordechaim

Btw my initial idea was to batch all queries into a single multi-statement query (separated with semicolons), resulting in a single round trip.

But even if this isn't possible at the driver level, parallelism in a transaction is of course a great alternative.

mordechaim avatar Aug 31 '25 08:08 mordechaim

Btw my initial idea was to batch all queries into a single multi-statement query (separated with semicolons), resulting in a single round trip.

Unfortunately, I tried this, I argued with different AIs on this, - not possible! It's not possible for parameterized queries.

If you have queries without params, you can join them with semicolons, but queries almost always have parameters, so this technique is pretty much useless.

Any value is a parameter. SELECT count(*) FROM table has no params, but WHERE key = "value" has one. In inserts and updates all the values are parameters.

romeerez avatar Aug 31 '25 12:08 romeerez

I read on the docs, the protocol you're referring to is called pipelining.

https://github.com/brianc/node-postgres/issues/2646

There seems to be an open issue in node-pg for several years to support pipelining. Unfortunately doesn't look like happening anytime soon.

mordechaim avatar Aug 31 '25 13:08 mordechaim

Side note: An interesting way to update multiple rows with different values with a single round trip

https://github.com/brianc/node-postgres/issues/956#issuecomment-1046197203

mordechaim avatar Sep 01 '25 05:09 mordechaim

Side note: An interesting way to update multiple rows with different values with a single round trip

A temp table isn't really needed there, it could be

UPDATE coffeeOrders AS c
SET coffee = v.coffee
FROM (
    VALUES
        ('Alice', 'Latte'),
        ('Bob',   'Espresso')
) AS v(name, coffee)
WHERE c.name = v.name;

UPDATE...FROM is not yet supported but is planned, will be.

Thinking how the query above might look like:

db.coffeeOrders
  .from([ // from could support array of values  
    { name: 'Alice', coffee: 'Latte' },
    { name: 'Bob', coffee: 'Espresso' }
  ])
  // let it have alias "values" by default,
  // so you can use 'values.name' in .where
  .where({ name: (q) => q.ref('values.name') }) // q.ref is what's already there, wish it was a bit prettier
  .update({ coffee: (q) => q.ref('values.coffee') }) 

romeerez avatar Sep 02 '25 22:09 romeerez

I added support for postgres package!

Here are notes on how to use it.

There are many times in my code where I need to update multiple records at once. Since we can't update multiple records when they have different values I need to use many queries and await them all at once using Promise.all.

Now if you wrap that Promise.all in transaction and use postgres-js, it will be more performant: all queries will be sent at once.

Wrapping in transaction is good because:

  • pipe-lining works with transactions, probably won't without it
  • atomic change, if one update fails all updates are discarded
  • it's more performant on db side: db will persist all changes at once, instead of persisting every update individually
const results = await db.$transaction(async () => {
  return await Promise.all([
    db.table.find(1).update(data),
    db.table.find(2).update(data),
    db.table.find(3).update(data),
  ])
})

I'm not adding the initially proposed db.$batch because this code looks good enough.

romeerez avatar Sep 28 '25 02:09 romeerez

Great news! Thank you

mordechaim avatar Sep 28 '25 03:09 mordechaim

@romeerez The adapter key is missing in the orchidORM first config argument in TypeScript.

Also, doesn't porsager/postgres expose a pool object? I use @vercel/functions utility attachDatabasePool that automaticlaly closes the connection when a Vercel runtime function suspends.

mordechaim avatar Oct 03 '25 16:10 mordechaim