sqlite3-ruby icon indicating copy to clipboard operation
sqlite3-ruby copied to clipboard

Insert thousands or millions of rows in bulk

Open collimarco opened this issue 2 years ago • 3 comments

What is the fastest way to batch insert a large number of rows in SQLite using this gem?

Currently I use this:

db.execute('BEGIN TRANSACTION')
sql = 'INSERT INTO items (id, name) VALUES (?, ?)'
items.each_with_index { |name, id| db.execute(sql, [id, name]) }
db.execute('COMMIT')

The items come from a large files and need to be inserted in the SQLite database as fast as possible.

  1. Is there any better / optimized solution?
  2. For example I was considering to keep all the inserts in Ruby in an array and then use execute_batch, however I can't find a way to sanitize / interpolate the statements in this case. Is there a method like db.sanitize(sql, [id, name])?

collimarco avatar Sep 06 '22 20:09 collimarco

I found this optimization with prepared statements:

db.execute('BEGIN TRANSACTION')
sql = db.prepare('INSERT INTO items (id, name) VALUES (?, ?)')
items.each_with_index { |name, id| sql.execute([id, name]) }
db.execute('COMMIT')

I can reach 250000 inserts/sec on Ruby 3.0 / MacOS Monterey 2,2GHz i7. I was hoping for something more due to the large volume of data that I would like to process in this way.

Is there any better solution?

collimarco avatar Sep 06 '22 22:09 collimarco

I don't think there's a better solution using this gem. I'll leave this open in case any of the more seasoned maintainers want to weigh in with advice.

flavorjones avatar Sep 07 '22 14:09 flavorjones

Expand the statement to insert multiple rows at once

INSERT INTO items (id, name) VALUES (?, ?), (?, ?), .. , (?, ?);

This ensures you are not going back and forth a lot between SQLite and Ruby

Prepare two statements, the above and one with just a single row insert to capture any leftovers

Make sure your transactions fit in the page cache

Don't just insert all the records in a single transaction, make sure you select a decent size of inserts per transaction (should be thousands) and make sure your page cache can fit all these rows

Use WAL mode with synchronous = 1

If you are not doing so already

Try to insert ids in order

If that is possible with your data set, make the ids ordered before feeding the records to SQLite

oldmoe avatar Sep 12 '23 00:09 oldmoe