fast-sqlite3-inserts icon indicating copy to clipboard operation
fast-sqlite3-inserts copied to clipboard

Increase the page size to 64K

Open pkhuong opened this issue 4 years ago • 2 comments

https://avi.im/blag/2021/fast-sqlite-inserts/ mentions that even a fully in-memory db takes ~29 seconds to insert 100M rows. Increasing the size of btree nodes, from 4 KB to 64 KB, seems like the next low-hanging fruit to try.

Unfortunately, this change also impacts the cache size: it is specified in pages, so increasing the page size from 4 KB to 64 KB also grows the cache by 16x. Any speed-up from this commit should therefore be compared with a version that sets PRAGMA cache_size = 16000000;.

pkhuong avatar Jul 18 '21 13:07 pkhuong

@pkhuong could you test, post the before and after numbers?

avinassh avatar Jul 25 '21 06:07 avinassh

Hello from the future! I took the time to brush up on the dependency versions and test with PRAGMA page_size = 65536;.

I'm on BTRFS with nodatacow, which doesn't actually help that much. I skipped the Python and SQLx versions because they were too slow.

# baseline
duminică 2 iunie 2024, 13:07:47 +0530 [RUST] basic.rs (100_000_000) inserts

real	2m1,665s
user	1m59,444s
sys	0m1,916s
duminică 2 iunie 2024, 13:09:49 +0530 [RUST] basic_batched_wp.rs (100_000_000) inserts

real	1m40,988s
user	1m34,132s
sys	0m6,518s
duminică 2 iunie 2024, 13:11:30 +0530 [RUST] threaded_str_batched.rs (100_000_000) inserts

real	1m33,722s
user	1m36,400s
sys	0m7,754s
duminică 2 iunie 2024, 13:13:04 +0530 [RUST] basic_prep.rs (100_000_000) inserts

real	0m38,444s
user	0m36,426s
sys	0m1,928s
duminică 2 iunie 2024, 13:13:43 +0530 [RUST] basic_batched.rs (100_000_000) inserts

real	0m17,025s
user	0m15,053s
sys	0m1,923s
duminică 2 iunie 2024, 13:14:00 +0530 [RUST] threaded_batched.rs (100_000_000) inserts

real	0m16,686s
user	0m15,555s
sys	0m11,237s

# page_size = 65536
duminică 2 iunie 2024, 13:24:31 +0530 [RUST] basic.rs (100_000_000) inserts

real	2m4,589s
user	2m3,121s
sys	0m1,177s
duminică 2 iunie 2024, 13:26:36 +0530 [RUST] basic_batched_wp.rs (100_000_000) inserts

real	1m41,781s
user	1m35,434s
sys	0m6,026s
duminică 2 iunie 2024, 13:28:18 +0530 [RUST] threaded_str_batched.rs (100_000_000) inserts

real	1m34,286s
user	1m37,408s
sys	0m7,274s
duminică 2 iunie 2024, 13:29:52 +0530 [RUST] basic_prep.rs (100_000_000) inserts

real	0m36,846s
user	0m35,585s
sys	0m1,174s
duminică 2 iunie 2024, 13:30:30 +0530 [RUST] basic_batched.rs (100_000_000) inserts

real	0m15,246s
user	0m14,084s
sys	0m1,121s
duminică 2 iunie 2024, 13:30:45 +0530 [RUST] threaded_batched.rs (100_000_000) inserts

real	0m14,880s
user	0m15,084s
sys	0m10,567s

Looks like it helps, but only a little. And please don't use the threaded version (take a look at the sys time).

PS: you can reduce the duplication by using Option<AreaCode>. It implements ToSql so None will be inserted as a NULL.

lnicola avatar Jun 02 '24 08:06 lnicola