fast-sqlite3-inserts Add pure SQLite approach, with generate

trafficstars

Based from #11. This PR adds a new method of quickly inserting 100M rows into a SQLite table, using a pure SQLite approach.

Using the generate_series table valued function, we can insert 100M rows into a table very, very quickly with a single INSERT INTO ... SELECT statement.

The question of "how fast is this" is difficult to answer. I'm not sure if it's something with my computer/background apps I'm running, but my benchmarks of the previous python/rust solutions have been all over the place. Using this approach, I get 57s. For the pypy threaded_batched approach, I get 5m50s, and with the rust threaded_batched approach, I get 1m9s.

$ time sqlite3x sqlite3.db  '.read load.sql'
real 0m57.414s
user 0m45.714s
sys 0m6.574s

$ time ~/Downloads/pypy3.8-v7.3.7-osx64/bin/pypy3 threaded_batched.py
real 5m49.887s
user 3m4.141s
sys 1m52.690s

$ time ./target/release/threaded_batched
real 1m9.807s
user 0m49.430s
sys 0m29.562s

Given this, I'm tempted to say that this actually might be the fastest solution, but given my wildly different benchmark times (pypy's 5m50s vs 126s, rust's 1m9s vs 30s), I'd love to see how this runs on your computer, using the full benchmark!

Why (I think) it's so fast

One solid reason why I think this approach is so fast is because it's only a single SQL statement that is being ran. All the other approaches are performing millions of insert into ... values (...) SQL statements at a time, which is a fast operation, but doing anything millions of times starts to add up. We see an obvious benefit when we start to batch statements together, but even batching 50 at a time is still 2 million distinct SQL statements that need to be ran.

Also, by using generate_series, all operations are kept in C, which is very fast. Rust and pypy are also fast, but my guess is that context switching between Rust -> C or pypy -> C takes a non-trivial amount of time that adds up fast.

Also, on generate_series vs recursive CTEs, the generate_series page mentions that recursive CTEs are slower than table-valued functions.

Note: generate_series may not be in your sqlite CLI by default, so make sure the sqlite3 CLI that you have contains that

Would love to hear what you think, and to see the "official" benchmark numbers!

Dec 13 '21 01:12 asg017

Hey @asg017,

I tried your version on my iMac (3,6 GHz Intel Core i9, 8 cores) and unfortunately, I am not seeing better timings than the best Rust ones.

sqlite3 --version
3.34.0 2020-12-01 16:14:00 a26b6597e3ae272231b96f9982c3bcc17ddec2f2b6eb4df06a224b91089fed5b

# The Rust program
./target/release/threaded_batched  25,57s user 3,00s system 109% cpu 26,178 total

# The pure SQLite version
sqlite3 sqlite3.db '.read load.sql'  40,06s user 1,95s system 84% cpu 49,616 total

Not sure why but it is maybe because the random generator is not that fast internally and this is also something that I find to be unfair as the original Blog Post wasn't specifying the random generator to use.

It should be fairer to align the RNG of the Python and Rust programs to the one used by SQLite internally and if possible to seed the programs to be sure that the same databases are generated at the end. It is too easy to generate false random numbers very fast and gain speed this way.

Dec 13 '21 15:12 Kerollmops

Hey @Kerollmops , thanks for giving it a shot!

I agree that this may all come down to the random() function - I completely removed the random() in the pure SQLite version once and it finished in half the time. The builtin random in SQLite is a custom psuedo random number generator, which may leave some room for improvement. I wonder if it's feasible to implement a new random function using a loadable extension using one of these strategies to make a fast_random() function to cut down the time, but it's been a while since I've done anything in C...

(And the same point in the opposite way - maybe a custom RNG in rust/pypy might make those runs faster?)

In general benchmarks are always hard, especially on different hardware. I'm guessing my specific machine isn't efficiently using all of its cores or something. And I'd say there's even more things you can benchmark when testing inserts in SQLite - here we insert 3 random numbers 100million times, but what if we instead insert 100million rows from a CSV, or 100million JSON objects? Of course this is out of scope of what this repo is, but I definitely love to think about it!

And some other random notes: I tried added the pragma statements that made the other approaches faster, like so:

PRAGMA journal_mode = OFF;
PRAGMA synchronous = 0;
PRAGMA cache_size = 1000000;
PRAGMA locking_mode = EXCLUSIVE;
PRAGMA temp_store = MEMORY;

But it actually made this slower, by ~15s for me. I'll try to find out why, since I don't quite understand these pragma statements to begin with...

Dec 13 '21 18:12 asg017

> ./bench.sh
Fri Dec 17 13:04:18 IST 2021 [SQLite] running sqlite3 (100_000_000) inserts
off
exclusive

real	0m49.328s
user	0m46.876s
sys	0m1.733s
Fri Dec 17 13:06:06 IST 2021 [RUST] threaded_batched.rs (100_000_000) inserts

real	0m27.184s
user	0m40.531s
sys	0m3.865s

With

PRAGMA synchronous = OFF;
PRAGMA journal_mode = OFF;
PRAGMA cache_size = 1000000;
PRAGMA locking_mode = EXCLUSIVE;

create table IF NOT EXISTS user (
  id INTEGER not null primary key,
  area CHAR(6),
  age INTEGER not null,
  active INTEGER not null
);

insert into user (area, age, active)
select 
  abs(random() % 99999) as area, 
  (abs(random() % 3) + 1) * 5 as age, 
  abs(random() % 1) as active
from generate_series(1, 100000000);

> sqlite3 --version
3.36.0 2021-06-18 18:58:49 d24547a13b6b119c43ca2ede05fecaa707068f18c7430d47fc95fb5a2232aapl

Dec 17 '21 07:12 grishy

Also, I tested without random functions

> ./bench.sh
Fri Dec 17 13:34:49 IST 2021 [SQLite] running sqlite3 (100_000_000) inserts
off
exclusive

real	0m18.405s
user	0m16.383s
sys	0m1.870s
warning: unused import: `fastrand`
 --> src/bin/common.rs:1:5
  |
1 | use fastrand;
  |     ^^^^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

Fri Dec 17 13:35:08 IST 2021 [RUST] threaded_batched.rs (100_000_000) inserts

real	0m31.068s
user	0m54.546s
sys	0m5.210s

I replaced a code for random generation in SQL and in Rust. The function get_random_* just return a constant, like a SQLite version.
I think the maximum gain in Rust is due to the fact that get_random_* is called from different threads and is faster than SQL random

SQLite random(): https://www.sqlite.org/c3ref/randomness.html

Dec 17 '21 07:12 grishy

fast-sqlite3-inserts
fast-sqlite3-inserts copied to clipboard

Add pure SQLite approach, with generate_series

Why (I think) it's so fast

fast-sqlite3-inserts fast-sqlite3-inserts copied to clipboard

Add pure SQLite approach, with generate_series

Why (I think) it's so fast

fast-sqlite3-inserts
fast-sqlite3-inserts copied to clipboard