sled Inserting into large trees

Hello, thank you for your work on sled, it looks really interesting. I figured I'd try sled as a replacement for rocksdb in openethereum client for storing blockchain data. Here are some observations:

Rebuilding the state trie with sled, my hardware does about 100k-200k inserts per second on a fresh empty tree. As it fills up, the inserting becomes progressively slower. At about 100 million records it takes 10-60s for 100k inserts. I used tree.apply_batch() and batches of about 100k records. Is this slowdown expected given the sled architecture and large number of records?
With this workload and sled 0.34.6 memory usage would quickly increase above the configured limit, eating up all the free ram. But 1fe4ba0f57 behaved well and respected the cache_capacity. I couldn't go further than that commit, since later typenum was introduced as a dependency and it conflicted with the version openetherum pulls in.
I didn't observe much performance difference between LowSpace and HighThroughput mode, except a bit larger disk usage for the latter.

Feb 25 '21 17:02 mdben1247

If your workload doesn't require the 100k keys to be stored to the database in one atomic operation, I would recommend using insert instead of apply_batch for better performance.

Feb 25 '21 17:02 divergentdave

I tried inserting non-atomically and indeed, it is faster. However, writing still gets much slower as the tree size increases. I was hoping for O(log(n)), but this doesn't seem to be the case. I made a small program to isolate the measurements.

extern crate sled;
extern crate sha3;

use sled::{Config, IVec};
use sha3::{Digest, Sha3_256};

fn main() {

    let config = Config::default()
        .path("testdb")
        .create_new(true);

    let db = config.open().unwrap();
    let tree = db.open_tree("test").unwrap();
    let mut batch  = Vec::<(IVec, IVec)>::new();

    for n in 0..1_000_000_000 as u64 {

        let value = n.to_be_bytes();
        let mut hasher = Sha3_256::new();
        hasher.update(value);
        let key = hasher.finalize();

        batch.push((key.as_slice().into(), value.as_ref().into()));

        let l = batch.len();
        if l == 1_000_000 {
            let now = std::time::Instant::now();
            for (k, v) in batch.split_off(0) {
                tree.insert(k, v).unwrap();
            }
            println!("tree size: {}, last 1m inserts us/op: {}", n + 1, now.elapsed().as_micros() / l as u128);
        }
    }
}

It tries to insert one 10^9 records with what could be considered random 32 bytes keys. Unfortunately, I didn't get far as it segfaults after a while, but it shows how the insert times grow larger. The segfault could be related to #1299. Attached are my results for commits b9a64fe43c (current main) and 1fe4ba0f.

1fe4ba0f.txt b9a64fe43c.txt

I can try to debug the segfaults, but I'd need some guidance for how to do that.

Feb 26 '21 08:02 mdben1247

I tried a few different revisions and here is one that does not segfault: 4004085 (the next one I tried that did segfault was e329eae0).

At ~25M records, it takes ~10us to insert another record. At ~250M, it takes ~20us. At ~550M, it takes ~100us.

4004085.txt

Mar 01 '21 12:03 mdben1247

I also encountered the same type of problem.

Use tokio's mini-redis, sled as storage, I use redis-benchmark for set test, the data length is 100k. sled can only reach 1000 qps on my mac pro.

This is my sled configuration:

fn init_db() -> Result<Db> {
    let db: Db = sled::Config::new()
        .mode(Mode::HighThroughput)
        .cache_capacity(17179869184)
        .flush_every_ms(Some(1000))
        .path("my_db")
        .print_profile_on_drop(true)
        .open()?;

    return Ok(db);
}

May 13 '21 10:05 wwwbjqcom

sled sled copied to clipboard

Inserting into large trees

sled
sled copied to clipboard