db
db copied to clipboard
insertBatch to generate multiple INSERT statements
Right now, if the batch is too big, we're getting into limits of DBMS like the following:
The screenshot is from PostgreSQL.
I suggest doing two things:
- Introduce configurable parameter to configure max number of parameters.
- Default it per DBMS. For PostgreSQL it could be 65535.
- If max is reached, move the rest of the data to the next
INSERTstatement.
That will change the characteristic of this operation: from atomic to non-atomic.
You can't make it atomic (except transactions) in case of exceeding max number of parameters anyway.
Yest, but implicit changing atomic transaction to non-atomic is just a massive footgun. This should be explicitly enabled by programmer who makes the call, so developer will know that this batch insert may result multiple queries.
Also, splitting one massive insert to multiple smaller ones is useful not only to handle parameters limit. I generally avoid really big inserts as they may block table for other queries, so system may be more responsive if you do 10 small inserts instead of one big one. Having a method that would do this for you would be useful even if I don't exceed parameters limit in your queries.
Makes sense. Default value of such option must be null then to disable splitting.
Good point, but
- It is better if it works by default. Currently it does not work.
- The documentation should mention this.
By default it is better to specify driver specific max value and allow the developer to change it.
Either way is fine for me, if we'll put a warning about it being atomic or not in the docs.
By default it is better to specify driver specific max value and allow the developer to change it.
To be clear: I was talking about limit based on number of inserted rows, not number of params used by query. If I want to insert 50k records in 50 queries (1k records per query), then limit I need to pass to the function should 1000.
Yes. That sounds useful as well.
If it is driver-specific, shouldn't this be done on DBMS level? In the following packages:
yiisoft/db-mssql
yiisoft/db-mysql
yiisoft/db-oracle
yiisoft/db-pgsql
yiisoft/db-sqlite
By default it is better to specify driver specific max value and allow the developer to change it.
To be clear: I was talking about limit based on number of inserted rows, not number of params used by query. If I want to insert 50k records in 50 queries (1k records per query), then limit I need to pass to the function should
1000.
My vision is to introduce a parameter, let's say batchInsertChunkSize and default it to 0 (which means no limit), but if you set it to any non-negative value, then split insert queries in the appropriate chunks.
shouldn't this be done on DBMS level?
Implementation itself — yes. The interface should be common, though.
shouldn't this be done on DBMS level?
Implementation itself — yes. The interface should be common, though.
Basically we are discussing here two similar questions at once.
As for the original issue described in the very first message - I think we can incapsulate this logic inside the DBMS package itself.
For PostgreSQL we could add a property int $paramsLimit = 65535;. Same for other PDO Drivers.
In prepareBatchInsertValues(string $table, iterable $rows, array $columns, array &$params, int $chunkSize = 0) in AbstractDMLQueryBuilder where we can make use of this parameter.
And also we can introduce an int $chunkSize = 0 parameter. We will accept this parameter in insertBatch(string $table, iterable $rows, array $columns = [], array &$params = [], int $chunkSize = 0) and pass it here. Method will return array of chunks (By default only one single chunk).
This will require insertBatch() to return array of sql queries. Unfortunately it will break some BC.
It seem's like there is no easy way, to implement these features.
I've reproduced the bug for pgsql
One of the ways is to create method addRawSql() and something like addBindValues(). And store $sql as an array. And iterate this queries on execure() call
public function batchInsert(string $table, array $columns, iterable $rows): static
{
$table = $this->getQueryBuilder()->quoter()->quoteSql($table);
/** @psalm-var string[] $columns */
foreach ($columns as &$column) {
$column = $this->getQueryBuilder()->quoter()->quoteSql($column);
}
unset($column);
$params = [];
$sql = $this->getQueryBuilder()->batchInsert($table, $columns, $rows, $params);
$this->setRawSql($sql);
$this->bindValues($params);
return $this;
}
Or maybe think of introducing a DTO. Somethink like BatchInsertChunk which will store $sql and $params in it. Add addBatchInsertChunk() method and store them as an array inside of the Command. And iterate on execure() call.