sled icon indicating copy to clipboard operation
sled copied to clipboard

Inserting into large trees

Open mdben1247 opened this issue 3 years ago • 4 comments

Hello, thank you for your work on sled, it looks really interesting. I figured I'd try sled as a replacement for rocksdb in openethereum client for storing blockchain data. Here are some observations:

  • Rebuilding the state trie with sled, my hardware does about 100k-200k inserts per second on a fresh empty tree. As it fills up, the inserting becomes progressively slower. At about 100 million records it takes 10-60s for 100k inserts. I used tree.apply_batch() and batches of about 100k records. Is this slowdown expected given the sled architecture and large number of records?

  • With this workload and sled 0.34.6 memory usage would quickly increase above the configured limit, eating up all the free ram. But 1fe4ba0f57 behaved well and respected the cache_capacity. I couldn't go further than that commit, since later typenum was introduced as a dependency and it conflicted with the version openetherum pulls in.

  • I didn't observe much performance difference between LowSpace and HighThroughput mode, except a bit larger disk usage for the latter.

mdben1247 avatar Feb 25 '21 17:02 mdben1247

If your workload doesn't require the 100k keys to be stored to the database in one atomic operation, I would recommend using insert instead of apply_batch for better performance.

divergentdave avatar Feb 25 '21 17:02 divergentdave

I tried inserting non-atomically and indeed, it is faster. However, writing still gets much slower as the tree size increases. I was hoping for O(log(n)), but this doesn't seem to be the case. I made a small program to isolate the measurements.

extern crate sled;
extern crate sha3;

use sled::{Config, IVec};
use sha3::{Digest, Sha3_256};

fn main() {

    let config = Config::default()
        .path("testdb")
        .create_new(true);

    let db = config.open().unwrap();
    let tree = db.open_tree("test").unwrap();
    let mut batch  = Vec::<(IVec, IVec)>::new();

    for n in 0..1_000_000_000 as u64 {

        let value = n.to_be_bytes();
        let mut hasher = Sha3_256::new();
        hasher.update(value);
        let key = hasher.finalize();

        batch.push((key.as_slice().into(), value.as_ref().into()));

        let l = batch.len();
        if l == 1_000_000 {
            let now = std::time::Instant::now();
            for (k, v) in batch.split_off(0) {
                tree.insert(k, v).unwrap();
            }
            println!("tree size: {}, last 1m inserts us/op: {}", n + 1, now.elapsed().as_micros() / l as u128);
        }
    }
}

It tries to insert one 10^9 records with what could be considered random 32 bytes keys. Unfortunately, I didn't get far as it segfaults after a while, but it shows how the insert times grow larger. The segfault could be related to #1299. Attached are my results for commits b9a64fe43c (current main) and 1fe4ba0f.

1fe4ba0f.txt b9a64fe43c.txt

I can try to debug the segfaults, but I'd need some guidance for how to do that.

mdben1247 avatar Feb 26 '21 08:02 mdben1247

I tried a few different revisions and here is one that does not segfault: 4004085 (the next one I tried that did segfault was e329eae0).

At ~25M records, it takes ~10us to insert another record. At ~250M, it takes ~20us. At ~550M, it takes ~100us.

4004085.txt

mdben1247 avatar Mar 01 '21 12:03 mdben1247

I also encountered the same type of problem.

Use tokio's mini-redis, sled as storage, I use redis-benchmark for set test, the data length is 100k. sled can only reach 1000 qps on my mac pro.

This is my sled configuration:

fn init_db() -> Result<Db> {
    let db: Db = sled::Config::new()
        .mode(Mode::HighThroughput)
        .cache_capacity(17179869184)
        .flush_every_ms(Some(1000))
        .path("my_db")
        .print_profile_on_drop(true)
        .open()?;

    return Ok(db);
}

wwwbjqcom avatar May 13 '21 10:05 wwwbjqcom