sled
sled copied to clipboard
Inserting into large trees
Hello, thank you for your work on sled, it looks really interesting. I figured I'd try sled as a replacement for rocksdb in openethereum client for storing blockchain data. Here are some observations:
-
Rebuilding the state trie with sled, my hardware does about 100k-200k inserts per second on a fresh empty tree. As it fills up, the inserting becomes progressively slower. At about 100 million records it takes 10-60s for 100k inserts. I used
tree.apply_batch()
and batches of about 100k records. Is this slowdown expected given the sled architecture and large number of records? -
With this workload and
sled
0.34.6
memory usage would quickly increase above the configured limit, eating up all the free ram. But 1fe4ba0f57 behaved well and respected thecache_capacity
. I couldn't go further than that commit, since latertypenum
was introduced as a dependency and it conflicted with the versionopenetherum
pulls in. -
I didn't observe much performance difference between
LowSpace
andHighThroughput
mode, except a bit larger disk usage for the latter.
If your workload doesn't require the 100k keys to be stored to the database in one atomic operation, I would recommend using insert instead of apply_batch for better performance.
I tried inserting non-atomically and indeed, it is faster. However, writing still gets much slower as the tree size increases. I was hoping for O(log(n))
, but this doesn't seem to be the case. I made a small program to isolate the measurements.
extern crate sled;
extern crate sha3;
use sled::{Config, IVec};
use sha3::{Digest, Sha3_256};
fn main() {
let config = Config::default()
.path("testdb")
.create_new(true);
let db = config.open().unwrap();
let tree = db.open_tree("test").unwrap();
let mut batch = Vec::<(IVec, IVec)>::new();
for n in 0..1_000_000_000 as u64 {
let value = n.to_be_bytes();
let mut hasher = Sha3_256::new();
hasher.update(value);
let key = hasher.finalize();
batch.push((key.as_slice().into(), value.as_ref().into()));
let l = batch.len();
if l == 1_000_000 {
let now = std::time::Instant::now();
for (k, v) in batch.split_off(0) {
tree.insert(k, v).unwrap();
}
println!("tree size: {}, last 1m inserts us/op: {}", n + 1, now.elapsed().as_micros() / l as u128);
}
}
}
It tries to insert one 10^9 records with what could be considered random 32 bytes keys. Unfortunately, I didn't get far as it segfaults after a while, but it shows how the insert times grow larger. The segfault could be related to #1299. Attached are my results for commits b9a64fe43c (current main) and 1fe4ba0f.
I can try to debug the segfaults, but I'd need some guidance for how to do that.
I tried a few different revisions and here is one that does not segfault: 4004085 (the next one I tried that did segfault was e329eae0).
At ~25M records, it takes ~10us to insert another record. At ~250M, it takes ~20us. At ~550M, it takes ~100us.
I also encountered the same type of problem.
Use tokio's mini-redis, sled as storage, I use redis-benchmark for set test, the data length is 100k. sled can only reach 1000 qps on my mac pro.
This is my sled configuration:
fn init_db() -> Result<Db> {
let db: Db = sled::Config::new()
.mode(Mode::HighThroughput)
.cache_capacity(17179869184)
.flush_every_ms(Some(1000))
.path("my_db")
.print_profile_on_drop(true)
.open()?;
return Ok(db);
}